You are on page 1of 22

Data & Knowledge Engineering 122 (2019) 159–180

Contents lists available at ScienceDirect

Data & Knowledge Engineering


journal homepage: www.elsevier.com/locate/datak

General representational automata using deep neural networks


Johnpaul C.I. a,b , Munaga V.N.K. Prasad b , S. Nickolas a , G.R. Gangadharan a ,∗
a National Institute of Technology, Tiruchirappalli, India
b
Institute for Development and Research in Banking Technology, Hyderabad, India

ARTICLE INFO ABSTRACT


Keywords: Unlabeled data representation constitutes a major challenge in data mining. Different unsuper-
Powerset vised learning methods such as clustering and dimensionality reduction form the basis of data
Representational learning representations. The impact of attribute combinations and their interactions on data is less
Automata
addressed by such models. A representation model supported with machine learning concepts
Unsupervised
can reveal more information about the nature of underlying data. We herein present a novel
Unlabeled
Categorical
unsupervised minimum attribute instance selection (UMAIS) labeling algorithm that selects a
Transition categorical attribute as a class label, and a novel attribute-based powerset generation (APSG)
Renewable energy algorithm for describing the formation of relevant attribute sets using correlation and powerset.
Bankruptcy Using these algorithms, we present a diagrammatic representation known as Representational
Automata that depict the importance of interactions among correlated and non-correlated
attributes present in an unlabeled dataset. We performed experiments using two large-scale
datasets from the energy and financial domains and compared our approach with other standard
classifiers. Our approach obtains a significantly better classification accuracy of 92.187% and
87.32% for the energy and financial datasets, respectively, compared to 74% and 82% of the
linear classifier, respectively.

1. Introduction

Data representation is an auxiliary process in data mining for understanding the hidden structure of data instances significantly.
Information in data instances is confined to the attribute values. Some of the facts that can be obtained from the attributes include
their relationship, growth trend, nature of clustering, frequency of attribute values, and correlation. These valuable information can
be used in visualizing the data representation if they are interpreted and analyzed with the appropriate machine learning algorithms.
Generally, two categories of attributes, namely the categorical and numerical attributes exist in labeled and unlabeled data. In the
case of labeled data, a designated attribute known as the class label exists, and is used primarily in supervised learning methods
such as regression and classification [1].
Artificial neural networks (ANNs) are typically used for classification tasks. They contain a set of weights which denote how
strongly each of the neurons impact the other. A neuronal output is determined by the cumulated sum of weights and their respective
inputs. If the cumulated sum is less than the threshold value then the output is zero otherwise the output is one. The loss function
which is the difference between the predicted and expected value is used to update the weights of each input in a specific number
of iterations [2]. The classification accuracies of the deep neural network (DNN), feed forward neural network (FFNN), and linear
classifiers are affected by the number of instances in each class. If the sufficient number of training instances exists for each class,
then the classification accuracy is justifiable. If the number of instances in a class is less than the threshold level, then the classifier

∗ Corresponding author.
E-mail addresses: johnpaul.ci@gmail.com (Johnpaul C.I.), mvnkprasad@idrbt.ac.in (M.V.N.K. Prasad), nickolas@nitt.edu (S. Nickolas), ganga@nitt.edu
(G.R. Gangadharan).

https://doi.org/10.1016/j.datak.2019.06.004
Received 11 December 2018; Received in revised form 6 June 2019; Accepted 10 June 2019
Available online 13 June 2019
0169-023X/© 2019 Elsevier B.V. All rights reserved.
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

fails in generating the evaluation metrics. One solution to this paradox is to group less numbered classes into a common group [3].
This algorithmic modification is suited for unsupervised labeling methods to proceed with classification. Unlabeled data is devoid
of any designated class label. Labeling algorithm groups the instances into classes and makes the dataset ready for other algorithms
to act up on it. To obtain the structure or information about such data through a representational model, it is essential to start with
labeling using unsupervised learning methods. We herein propose a method for labeling and visualizing attribute interactions using
DNNs.
Unsupervised machine learning is a broad category of machine learning algorithms, in which the input instances are not classified
into any class or label [4]. Human learning is primarily unsupervised in nature. Such learning could occur only after repeated trials
or self-decision, whose impact is either favorable or non-favorable. This systematic learning mechanism leads to appropriate decision
and conclusions. All unsupervised learning methods aim to obtain a representational model from the existing data. This model is
later used as the base platform for further learning processes [5]. Models from the existing data are obtained by observing various
patterns and association of patterns [6]. The frequent occurrence of such patterns in the data clearly elucidates the data structure.
Thus, unsupervised machine learning algorithms aim to obtain various patterns from the unlabeled data that are subsequently used
for classification or clustering methods [7]. In unsupervised learning, a new pattern in the data may occur at any time. Although its
impact can be high or low, it has to be regarded as a new feature or pattern with utmost importance. The partial or full ignorance of
such patterns will result in missing conclusions, and will affect the subsequent decision making process. Grouping based on patterns
or similarity will result in clusters that must be evaluated by various metrics [6,8]. In our proposed representational automata, each
attribute combination is regarded as a pattern. One has to ensure that each group contains sufficient data to support the classification
properties. In some cases, the data quantity may be insufficient, and thus cannot be used by semi-supervised algorithms to perform
training and testing. In such cases, the association of two or more groups must be considered to obtain an optimized result. The
initial methods of pattern identification from data are expensive in-terms of execution time.
Currently, a significant amount of structured or unstructured data is generated. Mining such data using various methods will
provide an insight into the functioning of the system. Apart from prediction, forecasting, classification, and clustering of such data,
it can be efficiently used to generate formal representational models that can be analyzed subsequently using mathematical and
formal verification methods [9,10]. These methods are advantageous as they can fill the gaps of design such as the need of additional
parameters, fine-tuning of model characteristics, relationship between the model parameters etc. by inspecting the model properties
carefully. Traditional data mining techniques include data analysis with tools or frameworks, thereby leading to various forms of
supervised or unsupervised learning methods. A representational model should expose the real behavior of attributes such that the
analyst obtains the entire picture of the dataset that constitute the accuracy of traditional learning methods [8]. Such an insight
will help the analyst to design goal-oriented data mining approaches that can converge into results quickly. It is also beneficial to
concentrate on the relationship of attributes whose changes will improve the accuracy of the learning methods [11].
Let D be an instance of data that contains attributes 𝐴1 , 𝐴2 , 𝐴3 , 𝐴4 , . . . 𝐴𝑛 . The grouping of relevant attributes in the data is useful
for obtaining a relationship among them [12]. The grouping procedures focus on less number of attributes instead of considering all
the attributes. More than one candidate can be selected as a label. Identifying an attribute to be selected as a label is essential for
supervised learning methods. The possibility of designing a general representational model based on the relationship of attributes
is challenging; Machine learning algorithms can be used to generate a representational model by acquiring the relationship of
attributes. For an unlabeled dataset with numerical and categorical attributes, it is necessary to identify the method to be adopted
for labeling. Traditionally, experts choose clustering algorithms to identify groups of similar data objects. K-means clustering is used
primarily for this purpose [13]. However, if the attributes and instances increase, then K-means clustering becomes extremely costly
in terms of proximity calculation [14,15]. Moreover, K-means is extremely iterative in nature. Obtaining a proper labeling algorithm
is a bottleneck. When the number of categorical attributes increases, another hurdle in numerical processing is posed [16]. The
proper mapping of categorical attributes must be performed before processing the dataset for generating a representational model.
We herein develop general representational automata using a DNN with variable numbers of hidden layers and hidden layer
neurons. Linear classifiers and the FFNN are used to obtain the fitness of a labeled dataset [17]. Representational automata provide
many useful insights particularly for energy and financial related data. The primary use of these automata is to determine the
behavior of the attribute group and their relationships. The correlation between attributes is a metric to define the relationship
between them. Representational automata generation explains the process of utilizing correlation and classification accuracy
measurements to form a data model. These automata are also used to identify the relevance of an attribute group (Powerset) in a
dataset, and the feasibility of adding a new attribute to the existing dataset. This novel method of unlabeled data representation
method is envisaged from the classification accuracy of attribute powersets. It is a minimal representation with states as powerset
combinations. Because it is obtained from data frames, this state transition diagram cannot be assessed with the real properties
of a finite automata. Similar to the pictorial representation of K-means clustering that forms an initial layout of data instances
for the researcher, this representation provides a quick summary of the underlying attribute relations. It projects the operational
characteristics of various attribute combinations by transition diagrams and new output states, which can be utilized subsequently
by other concepts in machine learning methods. Improving the classification accuracy is one of the main goals of the representational
methods. It would be helpful if there is a real-time model to represent the behavior of the attributes utilizing the changes in
classification accuracy.
The salient contributions of this paper are as follows.

• A novel unsupervised minimum attribute instance selection (UMAIS) labeling algorithm that selects a categorical attribute as
a class label.

160
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

• A novel attribute based powerset generation (APSG) algorithm for describing the formation of relevant attribute sets using
correlation and powerset.
• Generation of representational automata (a real-time pictorial model) that illustrate the importance of interactions among
correlated and non-correlated attributes in data.

The remainder of this paper is organized as follows. Section 2 presents the literature review pertaining to the current work. Section 3
describes the proposed work comprising an unsupervised minimal instance attribute selection algorithm and representational
automata formation in detail. Section 4 discusses the results obtained from two case studies based on the comparison with other
methods, loss function with accuracy, hidden-layer neurons with number of iterations, automata diagrams for linear and DNN
classifiers, and their analyses. Section 5 presents the conclusion and future scope of the work.

2. Literature review: representational learning and methods

Representational learning helps to view or present data clearly. Several mathematical models have been developed to describe
the behavior and trend of data. These models are either in the form of equations, diagrams, or data transformations. Representational
learning methods provide valuable insights into the data from various dimensions. The successful representation of underlying data
will facilitate the subsequent steps of feature learning. The implications from a data representation result in recognizing various
features from a data-set [18].
A successful method of data representation is principal component analysis (PCA) [19] that represents numerical data by
dimensionality reduction using eigenvectors. Following this algebraic method of dimensionality reduction, several other methods,
e.g., linear discriminant analysis (LDA), generalized discriminant analysis (GDA), and kernel PCA are established for feature
learning [20]. Feature learning and representation are used extensively in image processing for clustering, classification, and object
identification [18,21].
Representational learning can be grouped broadly into two major categories: global feature learning and manifold learning.
Global feature learning pertains to features that can represent data as transformations with fewer dimensions. It concentrates on
global features and data values. Global feature learning algorithms include independent component analysis (ICA), multitask feature
learning, canonical correlation analysis (CCA), and ensemble learning-based feature extraction. Traditional methods are used in
combination with modified parameters, resulting in hybrid methods that demonstrate better performance. GDA, Kernel PCA, 2DPCA,
and 2DLDA are some of the hybrid methods applied in specific use cases. For example, 2DLDA uses LDA for applications using second-
order tensors. Sammon mapping (A dimensionality reduction algorithm which is commonly used in exploratory data analysis) and
Kernel PCA (KPCA) are used to produce structures from a high-dimensional data [22–24]. Manifold learning algorithms aim to
obtain the local features of data rather than global features. Manifold methods obtain the structure of data by aggregating the local
features [18,25].
Krupka et al. [26] proposed a model of local parameters containing attributes using the rough set and Fuzzy methods. They
used RSTbox to formulate the rules for the model. Sánchez-Rada et al. [27] proposed a manifold learning approach called Onyx.
This model is used to represent emotions from social networking. Li et al. [28] proposed a graph-based approach that focused on
concept factorization. This factor-based method is used in dimensionality reduction, feature learning, and image clustering, devoid
of primary label information.
Representational learning demonstrates inherent challenges in its mode of presentation [22]. It is unclear whether a single set of
representations is adequate for all problems, or if a change in representation is required in accordance to the variation of problem
instances. The systematic and standard method of performing experiments by maintaining the usefulness of the method in specific
tasks is challenging. Various metrics suitable for this type of evaluation must be considered before reaching the conclusion on
the representational method. Exploring the properties of problem instances from the representational model can be achieved only
if the model is sufficiently simple to explain the attribute interactions of data. Researchers focus on developing new variants of
representational models that follow well-defined steps from algorithmic mapping to validation with different datasets.
Relational logic methods are used to develop representational models. In most cases, an initial clustering is to be performed with
unsupervised or supervised algorithms. In such cases, three fundamental issues arise: contents of clusters, the methods to design
and interpret relational objects, and the number of initial clusters [29]. Further, the data must be modeled in the form of graphs
with specific description regarding the vertices and edges. The relation between the clusters is bound by different similarity and
dissimilarity measurements such as neighborhood tree similarity, attribute-wise dissimilarity, connection dissimilarity, and edge
distribution dissimilarity.
Cao et al. [30] proposed a representational learning method for graph data, particularly on vertex-based vector mapping
that captures the structure of graph data. They focused on representing graph elements, i.e., vertices and edges as more feasible
linear structures. Once the representation is performed, several deep learning principles are applied over this model to refine the
information content in the representation using algorithms such as random walk in deep walk, skip-gram with negative sampling,
point-wise mutual information (PMI) matrix, and Singular Value Decomposition (SVD).
Lee et al. [31] provided a diverse insight into representational learning through a filter based approach in naive Bayes
classification. Weights are assigned using a new paradigm for feature learning in classification. This new information theoretic
paradigm improves the functionality of naive Bayes classification algorithm.
High-dimensional datasets are represented in a three-coordinate system using the k-means clustering algorithm. Luo et al. [32]
described a distance-based mapping of data features into a three-vector form. This three-vector representation forms the three

161
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 1. Structural flow of representational automata generation.

cluster centers of the high-dimensional data. This clustering method is extremely useful to represent the knowledge residing in
high-dimensional datasets.
Esmaeilzehi et al. [33] proposed a new classifier based on a sparse representation of non-parametric kernels. It explains
the instances of a class with instances of another class represented by the linear combination of sparse vectors. It describes a
non-parametric kernel space representation classifier (NKSRC) to avoid dependency problems in KSRC.
Information extraction from embeddings in social images form the basis of deep multi-view learning [34]. A weighted linkage
network is created using link information in social images, thereby producing a new representation of their connectivity. The types
of views considered in this method are visual content, text embeddings, and the relationship between text descriptions.
Representational learning models are useful in solving real-world problems. Deepika et al. [35] described a meta-learning
framework for predicting drug-drug interaction. Drug information from different sources are used to generate various feature
networks, thus facilitating the identification of the necessary interactions between drugs. A semi-supervised learning framework
with bagging SVM and the Node2Vec network representation method is used for the experimentation.
Tavanaei et al. [36] explained an event-based method for a spiking neural network to perform feature extraction. A representation
learning rule named spike timing dependent plasticity followed by a threshold-based vector quantization is proposed to identify the
features from MNIST and natural image datasets. Fine tuning the representation model by the threshold adjustment rule results in
the formation of a spiking visual representation.
Mhiri et al. [37] proposed a novel hierarchical handwriting representation in an unsupervised environment. The representation
involves the compression and encoding of image-based documents, followed by the ranking of features. This representation model
is later processed by spherical K-means to learn the features from the document.
Peng et al. [38] proposed a learning model for social network analysis using incremental term representation. It identifies
social network characteristics such as user behavior, influence propagation, and community influence. They discussed two typical
representation learning methods, namely the neural-network models and non-negative constraints matrix factorization models. They
designed a hybrid matrix factorization model to obtain the term co-occurrence resulted from a representation model for social
network analysis.
Alam et al. [39] proposed a novel deep generative model for representation learning. They adopted the task of improving data
representation efficiency by altering the neural network architecture. They proposed a recurrent neural network model known as
deep simultaneous recurrent belief network (D-SRBN) to generate efficient representations from unlabeled data. D-SRBN is designed
to process time-independent image data .
Yuan et al. [40] proposed a method named Wave2Vec that processes electronic health records (EHR) for producing meaningful
deep representations. This method establishes connectivity between bio-signal processing and semantic learning. The proposed model
can process motif co-occurrence information and time-series information of bio-signals from EHRs.
Lesort et al. [41] described different state representation learning (SRL) models based on control theory abstraction. This category
of representation model is well suited for projecting the features in real-time robotic learning scenarios. SRL methods adopt concepts
from reinforcement learning and the Markovian model to describe state-based features.
Anselmi et al. [42] proposed a group theoretic equivariance of symmetries that are used in representation learning. Symmetry-
based learning can reveal various dimensions of information without considering the numerical significance of data. They described
the need for tuning regularization parameters in the CNN to identify the invariances in the symmetry of data.
Xiong et al. [43] proposed a smart recommendation from API descriptions using representation learning. API descriptions are
used for generating a representation model that is used to recommend the appropriate web services to software developers such
that their development process can be accelerated. API descriptions are processed through a natural language processing pipeline
to extract the required features for the representation model.
Most of the existing methods described in our literature survey focus on all attributes irrespective of their importance. Table 1
summarizes various representational methods discussed in Section 2. The behavior of a group of attributes contributing to the
performance of an algorithm (supervised/unsupervised) in terms of its accuracy is not addressed by the current representational
methods. It is a challenging task to identify the attribute interactions and their contributions to the performance of the algorithm. It

162
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Table 1
List of methods for representational feature learning.
Author Representational method Features Remarks
Zhong et al. [18] (2016) Representational Learning methods Feature engineering. Recent trends in deep learning and
feature learning.
Jolliffe [19] (2016), Gkalelis et al. PCA, LDA and dimensionality Feature reduction, discriminant based methods. Linear
[20] (2017) reduction methods supervised discriminative global feature learning.
Bouguelia et al. [22] (2017), Feng Global and manifold feature learning Feature learning algorithms like ICA, CCA and ensemble
et al. [23] (2017), Oja [24] (2004) based methods. Variants of PCA, LDA and Sammon
mapping.
Krupka et al. [26] (2014) Learning using rough sets and fuzzy Feature learning using local parameters, rule models and
methods fuzzy based learning methods.
Sánchez-Rada et al. [22] (2017), Manifold feature learning Aggregating local features from data, representation of
Tenenbaum [25] (1998), Peng et al. social networking information.
[38] (2018)
Li et al. [28] (2017), Cao et al. [30] Graph-based representational learning Factor-based technique for dimensionality reduction,
(2015) vertex-based vector mapping and deep learning on to
extract information from the model.
Sebastijan [29] (2017) Relational logic based learning Representational models developed using relational objects.
Addresses the clustering of unlabeled data.
Lee et al. [31] (2018) Filter based naive Bayes Classification Theoretic paradigm for naive Bayes classification using
weights assigned to the features.
Luo et al. [32] (2018) Three-coordinate system Distance based method, knowledge representation of high
representation dimensional datasets.
Esmaeilzehi et al. [33] (2017) Sparse representation of Data instances as linear combination of sparse vectors.
non-parametric kernels Discuses non-parametric method of representing the data.
Huang et al. [34] (2018), Alam et al. Deep multiview learning method, Representation of social images and time dependent
[39] (2018) Deep reinforcement learning images. Text embeddings and their relationship.
Deepika et al. [35] (2018) Meta-learning framework Representing the drug-drug interaction, semi-supervised
learning with SVM and Node2Vec network representation.
Tavanaei et al. [36] (2018) Event-based spiking neural network Feature extraction from MNIST and natural image datasets,
spiking visual representation methods.
Mhiri et al. [37] (2018) Hierarchical representation of Unsupervised method which involves compression and
handwriting ranking of image-based documents.
Yuan et al. [40] (2019) Wave2Vec: Electronic Health Record Time-series bio-signals processing and deep representations.
processing
Lesort et al. [41] (2018) State Representation Learning (SRL) Real-time robotic feature learning and extraction.
Reinforcement learning with state-based features.
Anselmi et al. [42] (2019) Symmetry-based learning Parameter tuning methods in CNN for obtaining the
invariances in data symmetry.
Xiong et al. [43] (2018) Smart API recommendation API descriptions used to build representation model, web
services recommendation for software developers.

is more helpful if a model could explain these insights minimally. Hence, we develop general representational automata that exhibit
the interaction of attributes or attribute groups on the performance of machine learning algorithms using DNNs.

3. Representational automata: DNN based visualization model for attribute interactions

Representational automata are state transition diagrams formed by the attribute combinations of data. Although different
methods can be used to combine the attributes, rough sets are generally preferred for forming attribute combinations based on
the value of data instances. The maximum number of rough sets that can be formed with n attributes is 2𝑛 [44]. If a large number
of attributes exist in the data, then forming 2𝑛 attribute sets is a complex task. Hence, we formed our attribute combinations using
correlation. We formulated a method using powerset in combination with correlated and less correlated attributes for the formation
of representational automata. The steps involved in the generation of automata are illustrated in Fig. 1.
The noticeable modules in the structure include the UMAIS algorithm, attribute power set generation, modified powerset
generation, and powerset-based representational automata generation (PRAG) algorithm.

3.1. Unsupervised minimum attribute instance selection algorithm (UMAIS)

The UMAIS algorithm aims to group instances based on the minimum instance value of an attribute and perform the labeling
of data instances. The minimum instance value of an attribute is the minimum count of unique instances for a particular attribute.

163
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 2. The flow-diagram of unsupervised minimum attribute instance selection algorithm (UMAIS) algorithm.

Table 2
An instance of a dataset containing both categorical and numerical attributes.
𝐶1 𝐶2 𝐶3 𝑁1 𝑁2
A A A 𝑛1 𝑚1
B B A 𝑛2 𝑚2
B B B 𝑛3 𝑚3
B B B 𝑛4 𝑚4
C C C 𝑛5 𝑚5
C C C 𝑛6 𝑚6
D C D 𝑛7 𝑚7

Table 3
Labeled dataset obtained by applying UMAIS algorithm.
𝐶1 𝐶2 𝐶3 𝑁1 𝑁2
A 0 A 𝑛1 𝑚1
B 1 A 𝑛2 𝑚2
B 1 B 𝑛3 𝑚3
B 1 B 𝑛4 𝑚4
C 2 C 𝑛5 𝑚5
C 2 C 𝑛6 𝑚6
D 2 D 𝑛7 𝑚7

Tables 2 and 3 present the process of selecting a categorical attribute as a label based on the minimum count of unique instances.
Various steps in the algorithm are shown in Fig. 2. The preprocessing steps over the dataset are performed to handle the categorical
attributes. For experimentation purposes, we select a single categorical attribute as a label for the dataset.
Algorithm 1 accounts for the formation of labeled dataset 𝐿𝐷𝑛−1 , where n corresponds to the number of attributes in the set 𝑆𝐴
from the dataset. 𝑆𝐴𝐼𝐶 is the preprocessed set, where each element is of the form (a, 𝐼𝑎 , c), where a ∈ 𝑆𝐴 , 𝐼𝑎 is the set of unique
instances of an attribute a containing the elements { 𝑖1 , 𝑖2 , … , 𝑖𝑘 }, 𝑖𝑘 is the 𝑘th unique instance of the attribute 𝑎, and c is the count
of unique instances. LA is the attribute selected as a label. The functions unique(A) and count(unique(A)) produce the unique set 𝐼𝑎
and the count ‘c’ of the elements in 𝐼𝑎 , respectively. M is the mapped label set that contains the details of the attribute which is
selected as a class label. Each element of the set 𝑀 is of the form (x, y, z), where x = 𝐴𝑚𝑖𝑛 , y ∈ 𝐼𝑚𝑖𝑛 , and ‘z’ is the label id, 0 ≤
z ≤ c - 1. 𝐴𝑚𝑖𝑛 is the name of the attribute having the minimum number of unique values ‘c’. 𝐼𝑚𝑖𝑛 is the set of unique instances of
the attribute 𝐴𝑚𝑖𝑛 . 𝑙𝑘 is a label variable to store the index of unique instance 𝑖𝑘 . The dataset instances of the 𝑗th row 𝐷𝑛 [𝑗].𝑖𝑘 are
replaced with the corresponding labels (𝑙1 , 𝑙2 , … , 𝑙𝑘 ).
Consider the instance of a dataset described in Table 2. The given dataset contains both categorical and numerical at-
tributes. Categorical and numerical attributes are represented as 𝐶𝑖 and 𝑁𝑗 respectively. The set of attributes 𝑆𝐴 contains
{ 𝐶1 , 𝐶2 , 𝐶3 , 𝑁1 , 𝑁2 }. The categorical attributes are the suitable candidates for a class label since they have less number of unique
instances compared to the numerical attributes [45]. The 𝑆𝐴𝐼𝐶 known as the set of categorical attribute instance count is obtained as
{ (𝐶1 , { 𝐴, 𝐵, 𝐶, 𝐷 }, 4), (𝐶2 , { 𝐴, 𝐵, 𝐶 }, 3), (𝐶3 , { 𝐴, 𝐵, 𝐶, 𝐷 }, 4) }. Hence the mapped set M is { (𝐶2 , 𝐴, 0), (𝐶2 , 𝐵, 1), (𝐶2 , 𝐶, 2)
}. The final values of 𝐴𝑚𝑖𝑛 , 𝐼𝑚𝑖𝑛 , and 𝑐 are 𝐶2 , {𝐴, 𝐵, 𝐶} and 3 respectively. The mapped set M is used to label the dataset with class
labels. The labeled dataset after applying the UMAIS algorithm is shown in Table 3. The attribute 𝐶2 is selected as the class label
and the unique instances of the 𝐶2 are labeled from 0 to 2, present in each row of the dataset 𝐷𝑛 .
The UMAIS algorithm poses several questions, including the selection of categorical attributes, which have the same number
of unique attribute values. If two attributes contain the same number of unique attribute values, one of these attributes can be
randomly regarded as a label attribute.
Another concern regarding the algorithm is the minimum selection property. The minimum number of attribute instances impedes
the formation of more groups in the dataset. For example, if we consider the categorical attribute 𝐶3 , four attribute instances exist:
{ A, B, C, D }. If we could select this as a label, four clusters would exist in the dataset. However, the number of dataset instances in
each group may be less; this may affect the training performance adversely, thereby affecting the objective of the proposed model.

164
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Algorithm 1: Unsupervised Minimum Attribute Instance Selection Algorithm (UMAIS)


Input: A Dataset 𝐷𝑛 containing Attribute Set 𝑆𝐴 = { 𝐴1 , 𝐴2 , 𝐴3 , ... , 𝐴𝑛 }
Output: A Labeled Dataset 𝐿𝐷𝑛−1 = { 𝐴1 , 𝐴2 , 𝐴3 , ... , 𝐴𝑛−1 , 𝐿𝐴 }
Initialization: 𝑆𝐴𝐼𝐶 = 𝛷, 𝑀 = 𝛷, 𝐼𝑚𝑖𝑛 = 𝛷;
while (| 𝑆𝐴 | ≠ 𝛷) do
A ∈ 𝑆𝐴
𝑆𝐴𝐼𝐶 ← 𝑆𝐴𝐼𝐶 ∪ (A,unique(A),count(unique(A))
𝑆𝐴 ← 𝑆𝐴 - A
min ← ∞
repeat
(a, i, c) ∈ 𝑆𝐴𝐼𝐶
if (c < min) then
min ← c
𝐴𝑚𝑖𝑛 ← a
𝐼𝑚𝑖𝑛 ← i
𝑆𝐴𝐼𝐶 ← 𝑆𝐴𝐼𝐶 - (a, i, c)
until | 𝑆𝐴𝐼𝐶 | = 𝛷;
k←0
repeat
𝑖𝑘 ∈ 𝐼𝑚𝑖𝑛
𝑙𝑘 ← k
𝐼𝑚𝑖𝑛 ← 𝐼𝑚𝑖𝑛 - 𝑖𝑘
M ← M ∪ (𝐴𝑚𝑖𝑛 , 𝑖𝑘 , 𝑙𝑘 )
k←k+1
until k ≠ c;
j←0
while (𝑗 ≠ | 𝐷𝑛 |) do
select(𝐴𝑚𝑖𝑛 , 𝑖1 ), (𝐴𝑚𝑖𝑛 , 𝑖2 ) ... (𝐴𝑚𝑖𝑛 , 𝑖𝑐 ) from M and 𝐷𝑛 [𝑗]
replace(𝐷𝑛 [𝑗].𝑖1 , 𝐷𝑛 [𝑗].𝑖2 ... 𝐷𝑛 [𝑗].𝑖𝑘 ) with (𝑙1 , 𝑙2 , ... 𝑙𝑘 )
𝑗←𝑗+1

3.2. Correlation-based attribute powerset generation (APSG)

APSG is designed to produce attribute combinations. The input to this algorithm contains a set of correlated attributes. If we
consider the equivalence classes from the attribute perspective, it results in the generation of groups based on the property of
attributes. The accuracy variation of each powerset combinations of correlated attributes on every non-correlated attributes results
in the formation of the representational automata. The work described in this section uses the coefficient of correlation as the
property to group the attributes. The first step is to determine the correlation values between the attributes of the dataset. We select
the attribute set where the correlation between them is greater than the threshold value of 0.5.

Algorithm 2: Attribute PowerSet Generation (APSG)


Input: A Labeled Dataset 𝐿𝐷𝑛−1 = { 𝐴1 , 𝐴2 , 𝐴3 , ... , 𝐴𝑛−1 }
Output: A Powerset Combination, 𝑃𝑖 and 𝑃𝑓 , More Correlated Attribute Set 𝑆𝐶 , Less Correlated Attribute Set 𝑆𝑁𝐶
Initialization: 𝑆𝐶 = 𝛷, 𝑆𝑁𝐶 = 𝛷, 𝑃𝑖 = 𝛷, 𝑃𝑓 = 𝛷, Threshold = 0.5;
Read Dataframes, 𝐿𝐷𝑛−1 ;
𝑀𝑐𝑜𝑟𝑟𝑒 = generateCorrelation(𝐿𝐷𝑛−1 );
𝑈 𝑃𝑐𝑜𝑟𝑟𝑒 = upperTriangle(𝑀𝑐𝑜𝑟𝑟𝑒 );
𝑆𝐶 = findAttribute(𝑈 𝑃𝑐𝑜𝑟𝑟𝑒 , 𝑢𝑝𝑖𝑗 ≥ Threshold);
𝑆𝑁𝐶 = attributes(𝐿𝐷𝑛−1 ) - 𝑆𝐶 ;
𝑃𝑖 = powerSet(𝑆𝐶 );
while (𝑃𝑖 ≠ 𝛷) do
𝑒𝑖 ∈ 𝑃𝑖
if (|𝑒𝑖 | ≥ 2) then
𝑃𝑓 ← 𝑃𝑓 + 𝑒𝑖
𝑃𝑖 ← 𝑃𝑖 - 𝑒𝑖

The APSG method described in Algorithm 2 results in the formation of the required attribute powerset, and focuses on n - 1 attributes
where n is the total number of attributes present in the dataset 𝐷𝑛 . The correlation matrix formed by the generateCorrelation() method
by consuming n - 1 attributes is represented as 𝑀𝑐𝑜𝑟𝑟𝑒 . The upperTriangle() method uses the correlation matrix 𝑀𝑐𝑜𝑟𝑟𝑒 to produce the
values of attributes 𝑈 𝑃𝑐𝑜𝑟𝑟𝑒 that are greater than the threshold level of 0.5. The attributes of the dataset is extracted by the function
attributes(). 𝑆𝑁𝐶 contains attributes having less correlations i.e., less than the threshold value. The function findAttribute() results in

165
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 3. General N-layered feed forward neural network.

the selection of attributes having a correlation more than a threshold. The function powerSet() generates powersets from the more
correlated attributes 𝑆𝐶 and stores them in 𝑃𝑖 . To compute the data-frame accuracy using other classifier algorithms, the minimum
number of attributes in each element of powerset 𝑃𝑖 is fixed as two. 𝑃𝑓 is produced from 𝑃𝑖 by eliminating the single attribute
element 𝑒𝑖 or empty set. ie if |𝑒𝑖 | is zero or one.
Consider a scenario in which the attribute set 𝑆𝐴 contains 5 attributes namely 𝑎1 , 𝑎2 , 𝑎3 , 𝑎4 and 𝑎5 . The generateCorrelation()
method finds the correlation among these five attributes. The dimension of the correlation matrix between these attributes is 5𝑋5.
Since, the correlation matrix is symmetric, it is enough to consider the upper triangular or lower triangular matrix to obtain the
whole details of the correlation. Assume that the set 𝑆𝐶 contains the correlated attributes namely 𝑎2 , 𝑎3 and 𝑎5 . The set 𝑆𝑁𝐶 contains
the non-correlated attributes obtained by the difference of 𝑆𝐴 and 𝑆𝐶 . 𝑃𝑖 is a set that contains the initial powerset combinations
formed by the elements of set 𝑆𝐶 containing {𝑎2 }, {𝑎3 }, {𝑎5 }, {𝑎2 , 𝑎3 }, {𝑎2 , 𝑎5 }, {𝑎3 , 𝑎5 }, {𝑎2 , 𝑎3 , 𝑎5 } and { }. 𝑃𝑓 is the final powerset
combinations obtained from 𝑃𝑖 by discarding the empty and singleton sets. 𝑃𝑓 contains the following elements namely {𝑎2 , 𝑎3 },
{𝑎2 , 𝑎5 }, {𝑎3 , 𝑎5 } and {𝑎2 , 𝑎3 , 𝑎5 }.

3.3. Design of deep learning neural network classifiers in terms of tensors

Data frames generated from Algorithm 2 contain sufficient attribute values combined with the class labels. In our proposed work,
we used three deep learning neural networks: the general N-layered FFNN, DNN, and linear classifier that are designed with the
basic functionalities of Tensorflow. The Tensorflow-based design of a General N-layered FFNN is illustrated in Fig. 3.
The input to the neural network is labeled data containing input attributes and class labels. Preprocessing the labeled dataset is
essential for maintaining uniformity among the data values. Preprocessed data are used by attribute extraction procedures to obtain
the attribute values as column vectors. For experimentation, the training set data are considered as real numbers, and the labels
are of one-hot form such that it can match with the dimensions of the weight matrix and input values. All the attributes from the
input layer are designated as 𝑋𝑖𝑛 , and the output is represented as 𝑌𝑜𝑢𝑡 . The hidden layers are defined from 𝐻1 to 𝐻ℎ𝑛 . If the count
of the input attributes is C, the hidden layer neurons are 𝐻𝑛 , and m output classes exist, then the weight matrix is of dimensions C
× 𝐻𝑛 , and the final hidden layer weight matrix is of dimensions 𝐻𝑛 × m. neuronFunctions are the specific activation functions into
which the multiplied value of weights and inputs are fed. Traditionally, the most used neuronal functions are sigmoid and tanh. The
output layer produces the 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑_𝑌 . It is compared with the original output Y to obtain the loss of accuracy. Based on the loss
values, the loss optimizer distributes the gradient of loss over the layers by adjusting the weight matrix, resulting in weight update
and thereby preparing the whole neural network for further iterations. The experimentation performed over these algorithms are
described in Section 4.

3.4. Powerset-based representational automata generation algorithm (PRAG)

The PRAG algorithm generates the pictorial representation called representational automata using the classification accuracy
measurements of the powerset and the modified powerset data frame obtained by neural network classifiers described in Section 3.3.
The accuracy corresponding to a powerset data frame is stored in the form of an ordered pair (data frame, accuracy). Representa-
tional automata generation is the major feature of the PRAG algorithm described in Algorithm 3. Fig. 4 presents the flow diagram
of the powerset-based representational automata generation (see algorithms 3, 4, and 5).
The general input to this algorithm is the labeled dataset. The output generated by the algorithm is a transition state diagram
that contains three new states: increment transition state (ITS), decrement transition state (DTS), and equal transition state (ETS).
The ITS is obtained by accuracy improvement by combining the powerset with less-correlated attributes. The DTS is formed by
the reduction in the accuracy of the powerset in combination with less-correlated attributes. The ETS is formed when the accuracy
does not change after combining the powerset with less-correlated attributes. The powerset of correlated attributes along with the

166
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 4. The flow-diagram of powerset based representational automata generation algorithm.

measured value of accuracy on using one of the classifier cl comprises the set 𝑃 𝑆𝑎𝑐𝑐 . Each powerset element pse is modified by the
function modifiedPowerSet by adding a less-correlated attribute individually from the set 𝑆𝑁𝐶 .
The modified powerset values are stored in a set called MPS. If |𝑆𝐶 | = 𝑚 and |𝑆𝑁𝐶 | = 𝑛, then | MPS | will be mn. 𝑝𝑠𝑐 maintains the
count of powerset values during the iteration process. Each element in the modified powerset mpse is obtained and the corresponding
accuracy is calculated by the classifier cl. mpse together with the accuracy is stored in the set 𝑀𝑃 𝑆𝑎𝑐𝑐 . 𝑚𝑝𝑠𝑐 maintains the count
of iterations on 𝑀𝑃 𝑆𝑎𝑐𝑐 for all the accuracy computations. The sets 𝑃 𝑆𝑎𝑐𝑐 , 𝑀𝑃 𝑆𝑎𝑐𝑐 , and 𝑆𝑁𝐶 are used as the primary parameters
by the function automataGeneration for the representational automata generation in real time. The 𝑆𝑁𝐶 is the set of less-correlated
attributes obtained by Algorithm 2. The modified power set generation is described in Algorithm 4.
The modifiedPowerSet function obtains the powerset element pse. The modified powerset is generated based on the combination
of each non-correlated attribute with every element in the set 𝑃 𝑆. The function 𝑔𝑒𝑡𝑃 𝑜𝑤𝑒𝑟𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒() selects the element pse from the
powerset PS and stores in 𝑝𝑒𝑙𝑒𝑚𝑒𝑛𝑡 . Each non-correlated attribute 𝑛𝑐𝑒𝑙𝑒𝑚𝑒𝑛𝑡 is obtained from the set 𝑆𝑁𝐶 by the function 𝑔𝑒𝑡𝐸𝑙𝑒𝑚𝑒𝑛𝑡
with the index 𝑝𝑐 . Each element from the set 𝑆𝑁𝐶 is selected as 𝑛𝑐𝑒𝑙𝑒𝑚𝑒𝑛𝑡 and is made set union with 𝑝𝑒𝑙𝑒𝑚𝑒𝑛𝑡 to produce the modified
powerset 𝑚𝑝𝑒𝑙𝑒𝑚𝑒𝑛𝑡 . These modified powerset elements are updated in to the set 𝑀𝑃 𝑆𝑡𝑒𝑚𝑝 and it is returned to Algorithm 3. The
function getClassifier is used to select the classifiers from Tensorflow. To identify the DNN and linear classifiers, they are assigned
with values 1 and 2, respectively. As mentioned in Section 2, we used the DNN classifier and linear classifier to compute the accuracy
measurements and modify the powerset data frames.
The procedure automataGeneration is described in Algorithm 5 accepts 𝑃 𝑆𝑎𝑐𝑐 , 𝑀𝑃 𝑆𝑎𝑐𝑐 , and 𝑆𝑁𝐶 as parameters and decides
the states to be included in the automata. It generates a minimal automata that does not contain some of the properties of finite
automata. For example, the starting state in this representational automata does not exist. After the initialization of necessary
program variables, the procedure starts with extracting elements from the set 𝑃 𝑆𝑎𝑐𝑐 . The PRAG algorithm calculates the accuracy
of the classification by each powerset element. The accuracy and powerset elements are stored in the set 𝑃 𝑆𝑎𝑐𝑐 . The function
getPsAccuracy obtains the accuracy value from powerset element. 𝑇𝑅𝐴 is defined as the set to store the transition information that
forms the automata. The generated automata are captured by the state transition diagram 𝑅𝐴 .
The powerset element accuracy is stored in the variable 𝑝𝑠𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 . Similar to the powerset elements, the modified powerset
elements are obtained through the variable 𝑚𝑝𝑠𝑟𝑎 . The function getMpsAccuracy extracts the accuracy of the modified powerset
elements from the set 𝑀𝑃 𝑆𝑎𝑐𝑐 , and stores it in 𝑚𝑝𝑠𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 . The variable nc is the count to keep track of the elements in the set of

167
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

less-correlated set 𝑆𝑁𝐶 . Each element in the set 𝑆𝑁𝐶 is obtained as 𝑛𝑐𝑒𝑙𝑒 . This step is essential to build the transition state for each
powerset element. As mentioned in Section 3.4, the three new states i.e., ITS, ETS,

Algorithm 3: Powerset based Representational Automata Generation (PRAG)


Input: A Labeled Dataset 𝐿𝐷𝑛−1 = { 𝐴1 , 𝐴2 , 𝐴3 , ... , 𝐴𝑛−1 }
Output: Representational Automata 𝑅𝐴 Containing a powerset defined final states, Increment Transition State (ITS), Equal
Transition State (ETS) and Decrement Transition State (DTS) with a transition on non-correlated attributes
Initialization: 𝑃 𝑆𝑎𝑐𝑐 = 𝛷, 𝑀𝑃 𝑆𝑎𝑐𝑐 = 𝛷, MPS = 𝛷, 𝑝𝑠𝑐 = 0, 𝑚𝑝𝑠𝑐 = 0;
DF = readDataFrames(𝐿𝐷𝑛−1 );
F = prepareFeatures(DF);
Lb = prepareLabels(F);
PS = APSG(Lb);
while (𝑝𝑠𝑐 ≠ | PS |) do
pse ∈ PS
𝑀𝑃 𝑆 ← MPS + modifiedPowerSet(pse, 𝑆𝑁𝐶 )
df ← dataFrame(pse)
cl ← getClassifier(1)
accuracy ← cl(df)
𝑃 𝑆𝑎𝑐𝑐 ← 𝑃 𝑆𝑎𝑐𝑐 + (pse , accuracy)
PS ← PS - pse
𝑝𝑠𝑐 ← 𝑝𝑠𝑐 + 1
while (𝑚𝑝𝑠𝑐 ≠ | MPS |) do
mpse ∈ MPS
df ← dataFrame(mpse)
cl ← getClassifier(1)
accuracy ← cl(df)
𝑀𝑃 𝑆𝑎𝑐𝑐 ← 𝑀𝑃 𝑆𝑎𝑐𝑐 + (mpse , accuracy)
MPS ← MPS - mpse
𝑚𝑝𝑠𝑐 ← 𝑚𝑝𝑠𝑐 + 1
𝑅𝐴 ← automataGeneration(𝑃 𝑆𝑎𝑐𝑐 , 𝑀𝑃 𝑆𝑎𝑐𝑐 , 𝑆𝑁𝐶 )

and DTS are formed by the accuracy comparison of 𝑝𝑠𝑟𝑎 and 𝑚𝑝𝑠𝑟𝑎 . If the modified powerset accuracy is greater than its powerset
counterpart, then a new state called the ITS is established. Similarly, if the accuracy decreases, the DTS is established; if the
accuracies are equal, then ETS is formed. Transition set 𝑇𝑅𝐴 stores the transition information in the form (state, input, final-state)
where 𝑠𝑡𝑎𝑡𝑒 ∈ PS, input ∈ 𝑆𝑁𝐶 , and final-state ∈ { 𝐼𝑇 𝑆, 𝐸𝑇 𝑆, 𝐷𝑇 𝑆 }, respectively. All the states with the transitions are stored in
𝑅𝐴 and returned by the function. For creating nodes in real time, the function drawNode (starting state, destination state, edgename)
is invoked with the starting state ∈ 𝑃 𝑆, destination state ∈ { 𝐼𝑇 𝑆, 𝐸𝑇 𝑆, 𝐷𝑇 𝑆 }, and edgename ∈ 𝑆𝑁𝐶 as the parameters.

Algorithm 4: Modified Powerset Generation


Input : powerset elements, pse, 𝑆𝑁𝐶
Output: Modified Powerset MPS
Initialization: 𝑝𝑐 = 0, 𝑀𝑃 𝑆𝑡𝑒𝑚𝑝 = 𝛷;
begin procedure
repeat
𝑝𝑒𝑙𝑒𝑚𝑒𝑛𝑡 ← getPowerAttribute(pse)
𝑛𝑐𝑒𝑙𝑒𝑚𝑒𝑛𝑡 ← getElement(𝑝𝑐 , 𝑆𝑁𝐶 )
𝑚𝑝𝑒𝑙𝑒𝑚𝑒𝑛𝑡 ← 𝑝𝑒𝑙𝑒𝑚𝑒𝑛𝑡 ∪ 𝑛𝑐𝑒𝑙𝑒𝑚𝑒𝑛𝑡
𝑀𝑃 𝑆𝑡𝑒𝑚𝑝 ← 𝑀𝑃 𝑆𝑡𝑒𝑚𝑝 ∪ 𝑚𝑝𝑒𝑙𝑒𝑚𝑒𝑛𝑡
𝑝𝑐 ← 𝑝𝑐 + 1
until (𝑝𝑐 ≥ |𝑆𝑁𝐶 |);
return 𝑀𝑃 𝑆𝑡𝑒𝑚𝑝
end procedure

Consider the instance described in Section 3.2. The set 𝑃𝑓 contains the required powerset combinations of three attributes
namely 𝑎2 , 𝑎3 and 𝑎5 . The modified powerset combinations are produced by combining the elements of 𝑃𝑓 with the non-correlated
attributes from the set 𝑆𝑁𝐶 . The elements of the modified powerset 𝑀𝑃 𝑆 are namely {𝑎2 , 𝑎3 , 𝑎1 }, {𝑎2 , 𝑎3 , 𝑎4 }, {𝑎2 , 𝑎5 , 𝑎1 }, {𝑎2 , 𝑎5 , 𝑎4
}, {𝑎3 , 𝑎5 , 𝑎1 }, {𝑎3 , 𝑎5 , 𝑎4 }, {𝑎2 , 𝑎3 , 𝑎5 , 𝑎1 } and {𝑎2 , 𝑎3 , 𝑎5 , 𝑎4 }. The dataframe values corresponding to every element of the sets 𝑃𝑓 and
𝑀𝑃 𝐶 are given to the classifiers and obtain the classification accuracies. These accuracy values are stored in the sets 𝑃 𝑆𝑎𝑐𝑐 and
𝑀𝑃 𝑆𝑎𝑐𝑐 respectively. The function automataGeneration processes the sets 𝑃 𝑆𝑎𝑐𝑐 , 𝑀𝑃 𝑆𝑎𝑐𝑐 and 𝑆𝑁𝐶 to produce the representational
structure 𝑅𝐴 . For instance, consider an element from the set 𝑃 𝑆 as {𝑎3 , 𝑎5 }. The classification accuracy of this element is obtained
from the set 𝑃 𝑆𝑎𝑐𝑐 . Similarly the modified powerset accuracy values of the respective element is obtained from 𝑀𝑃 𝑆𝑎𝑐𝑐 . If the

168
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

accuracy of {𝑎3 , 𝑎5 , 𝑎1 } is greater than {𝑎3 , 𝑎5 }, then the transition of {𝑎3 , 𝑎5 } on 𝑎1 results into an increased transition state (ITS). If
the accuracy of {𝑎3 , 𝑎5 , 𝑎4 } is lesser than {𝑎3 , 𝑎5 }, then the transition of {𝑎3 , 𝑎5 } on 𝑎4 results into an decreased transition state (DTS).
Similarly the respective transition states are determined by comparing the accuracy values of every element in the sets 𝑃 𝑆𝑎𝑐𝑐 and
𝑀𝑃 𝑆𝑎𝑐𝑐 .

Algorithm 5: Representational Automata Generation


Input : 𝑃 𝑆𝑎𝑐𝑐 , 𝑀𝑃 𝑆𝑎𝑐𝑐 , 𝑆𝑁𝐶
Output: Representational Automata 𝑅𝐴 as in algorithm 3, Transition Set 𝑇𝑅𝐴
Initialisation: 𝑝𝑐 = 0, 𝑛𝑐 = 0, 𝑚𝑝𝑐 = 0, 𝑇𝑅𝐴 = 𝛷, 𝑅𝐴 = 𝛷;
begin procedure
repeat
𝑝𝑠𝑟𝑎 ∈ 𝑃 𝑆𝑎𝑐𝑐
𝑝𝑠𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ← getPsAccuracy(𝑝𝑠𝑟𝑎 )
nc = 0
repeat
𝑚𝑝𝑠𝑟𝑎 ← getElement(𝑚𝑝𝑐 , 𝑀𝑃 𝑆𝑎𝑐𝑐 )
𝑚𝑝𝑠𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ← getMpsAccuracy(𝑚𝑝𝑠𝑟𝑎 )
𝑛𝑐𝑒𝑙𝑒 = getElement(nc, 𝑆𝑁𝐶 )
if (𝑚𝑝𝑠𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 > 𝑝𝑠𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ) then
𝑇𝑅𝐴 ← 𝑇𝑅𝐴 ∪ (𝑝𝑠𝑟𝑎 , 𝑛𝑐𝑒𝑙𝑒 , ITS)
𝑅𝐴 ← 𝑅𝐴 ∪ drawNode(𝑝𝑠𝑟𝑎 , ITS, edge = 𝑛𝑐𝑒𝑙𝑒 )
ElseIf (𝑚𝑝𝑠𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑝𝑠𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 ) then
𝑇𝑅𝐴 ← 𝑇𝑅𝐴 ∪ (𝑝𝑠𝑟𝑎 , 𝑛𝑐𝑒𝑙𝑒 , ETS);
𝑅𝐴 ← 𝑅𝐴 ∪ drawNode(𝑝𝑠𝑟𝑎 , ETS, edge = 𝑛𝑐𝑒𝑙𝑒 )
else
𝑇𝑅𝐴 ← 𝑇𝑅𝐴 ∪ (𝑝𝑠𝑟𝑎 , 𝑛𝑐𝑒𝑙𝑒 , DTS)
𝑅𝐴 ← 𝑅𝐴 ∪ drawNode(𝑝𝑠𝑟𝑎 , DTS, edge = 𝑛𝑐𝑒𝑙𝑒 )
end
𝑚𝑝𝑐 ← 𝑚𝑝𝑐 + 1
nc ← nc + 1
until (𝑛𝑐 ≥ |𝑆𝑁𝐶 |);
𝑝𝑐 ← 𝑝𝑐 + 1
until (𝑝𝑐 ≥ |𝑃 𝑆𝑎𝑐𝑐 |);
return 𝑅𝐴
end procedure

4. Performance evaluation and visualization of representational automata

Representational automata generation experiments are performed on open-source tools and frameworks. We have implemented
them using python2.7 with subsidiary libraries i.e., TensorFlow 1.5 CPU version, pandas, NumPy, scikit-learn, and matplotlib.
Using the pandas framework, most of the quest on the data are accomplished. Graph generation is performed with matplotlib
and gnuplot 5.0. Visualization of representational automata is performed with a graph programming language named graphviz
(dot). The representational automata generation is accomplished with open-source utilities, i.e., neato and eog. The programs are
executed on a desktop machine of Intel Core i7-5500U CPU (2.40 GHz × 4), 16 GB of primary memory, and Ubuntu 16.04 LTS
64-bit operating system. We evaluated our proposed representational automata formulation using a renewable energy dataset and
a bankruptcy dataset.

4.1. Case study 1

The proposed algorithms were experimented with renewable energy data on the electricity supply system of continental Europe.
The data includes the demand and inflow of renewable energy for the time period 2012 to 2014 [46]. Power generation is
accomplished using solar and wind generators. We used the generator_info file that contains the generator information. It contains
the capacity and layout specifications of the generators. The attribute information of generator_info is shown in Table 4.
The attributes numbered from 1 to 4 are used only for the identification of data instances. Attributes numbered from 5 to 15 are
considered for experimentation purposes. In the execution of the proposed UMAIS over the data, the categorical attribute status is
selected as the label. No prior clustering was defined in the dataset, except the description of the attributes. The status attribute has
six values: operating fully, operating partially, no data in GEO, built and in test stage, decommissioned, and under construction. The
mapping for these values was performed by the UMAIS algorithm by assigning numerical class values from zero to five. Aggregating
the data instances to these classes shows that class 1 (operating fully) contains the most instances, and the least for class values 3
to 5. Among the empirically selected eight real-valued attributes, four attributes are selected as the most correlated attributes.
The first step in the automata generation is the generation of powersets with these four attributes. A set of less-correlated
attributes are also obtained by the difference between the attribute sets 𝑆𝐴 and 𝑆𝐶 . Four attributes demonstrate less correlation

169
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Table 4
Description of attributes in generator_info dataset.
Si. No Attribute Type Description
1 ID Integer Generator Id
2 name String Name
3 country String Country details
4 origin NodeID Bus which is closest
5 latitude Decimal Coordinate of latitude
6 longitude Decimal Coordinate of longitude
7 status String Current Status
8 primaryfuel String Fuel used most
9 secondaryfuel String Fuel used least
10 capacity Decimal Maximum capacity
11 lincost Decimal Cost marginal
12 cyclecost Decimal Cycle plant cost
13 minuptime Integer Minimum time online in shutdown
14 mindowntime Integer Minimum time (start up)
15 minonlinecapacity Decimal Production details

Table 5
Evaluation metrics values of powerset with linear classifier.
Powerset Precision Recall F1-Score Accuracy
ps01 0.80 0.89 0.84 0.893750
ps02 0.88 0.22 0.26 0.218750
ps03 0.80 0.89 0.84 0.893750
ps04 0.91 0.93 0.92 0.934375
ps05 0.91 0.93 0.92 0.934375
ps06 0.91 0.93 0.92 0.934375
ps07 0.88 0.23 0.29 0.234375
ps08 0.91 0.93 0.92 0.934375
ps09 0.91 0.93 0.92 0.934375
ps10 0.89 0.92 0.89 0.915625
ps11 0.89 0.47 0.57 0.465625

Table 6
Evaluation metrics values of powerset with DNN classifier.
Powerset Precision Recall F1-Score Accuracy
ps01 0.80 0.89 0.84 0.89375
ps02 0.90 0.93 0.91 0.92812
ps03 0.89 0.92 0.89 0.91562
ps04 0.91 0.93 0.92 0.93437
ps05 0.91 0.93 0.92 0.93437
ps06 0.91 0.93 0.92 0.934375
ps07 0.89 0.92 0.89 0.915625
ps08 0.90 0.93 0.91 0.928125
ps09 0.90 0.93 0.91 0.928125
ps10 0.91 0.93 0.92 0.934375
ps11 0.80 0.89 0.84 0.89375

between them. Using the attribute set that has more correlations, the powerset is created by maintaining the minimum condition
of the two attributes. The modified powerset is created by combining each powerset element with every less-correlated attribute.
Automata visualization was also performed using dot language by mapping nodes as the states, and the edges as transitions from
the powerset. To incorporate uniqueness in the data values, all the decimal values are converted into real values by the necessary
program constructs. The results of each procedure, comparisons of the DNN accuracy, FFNNs, and linear classifiers are described in
Sections 4.1.1 and 4.1.2.

4.1.1. Implications on FFNN, and linear and DNN classifiers on energy data
The results of the experiments performed on the FFNN by adjusting the number of iterations, number of hidden layer neurons,
and loss function estimation are shown in Fig. 5. The comparison of loss function and iterations illustrates the following behavior
of the FFNN classifier. Each line is formed by a neural network containing eight inputs, a number of hidden layer neurons, and
six different classes, which are obtained by the UMAIS algorithm. If the number of iterations increases, then more neurons are
normalized to reach a zero-error state. When the iterations reach to 2000 and 5000, the FFNN error-loss is rippled subsequently in
the hidden layer neurons. It is also evident that the neural network training requires more iterations to converge to the best results.
The accuracy of the FFNN having the maximum number of neurons is observed to be 89%. The results in Fig. 5 are obtained owing
to a lower learning factor 𝛼 of 0.005.

170
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 5. Loss function versus iterations with variable number of hidden layer neurons.

The experiments on varying the learning factor over a predefined number of iterations is shown in Fig. 6. It shows that the
frequency of the loss function oscillation is higher when a large learning factor is selected. Slow learning results in the behavior
formation of a neural network with less average loss. Fig. 7 shows another implication of the FNNN. This architecture of the neural
network contains eight inputs (eight attributes) and four hidden layers; each contains a variable number of neurons and six classes
(produced from the UMAIS algorithm). In Fig. 5, the learning factor is 0.005. Because it is a slow and solid learning process, the
loss function of a particular neural network never increases after reaching an optimal value. However, in Fig. 7, the neural network
architecture contains four hidden layers, each with a variable number of neurons and the learning factor is set to a value close to
zero. It disrupts the learning process, even though the number of layers increased. The loss function increases and decreases during
execution. This implies a trade-off between the number of neurons and the learning factor in the FFNN.
Fig. 8 shows the effect of linear and DNN classifiers on the powerset produced from the UMAIS algorithm. The linear classifier is
executed on the test data, by maintaining the parameters shuffle as False, and num_epochs as 1. Earlier, the training dataset was used
with the following settings: shuffle as True and num_epochs as 50. From Tables 5 and 6, the powersets ps02, ps07 and ps011 exhibit
a significant deviation between the metrics of classification. This deviation is observed when the number of training instances is
lower for any of the classes compared to other classes. Uneven distribution of data also results in such observation. In such cases,
linear classifiers are not appropriate for classification tasks. Compared to linear classifiers, the DNN exhibits proper weight updates

171
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 6. Variation of loss function with the learning factor 𝛼 in FFNN.

Fig. 7. Loss function versus iterations with variable number of hidden layer neurons on four layered architecture having high learning factor.

Fig. 8. Accuracy comparison of linear and DNN classifiers on powerset dataframes.

172
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 9. Automata transition accuracies of linear and DNN classifier.

Fig. 10. Visualization of representational automata transitions by linear classifier.

even when the training instances are distributed unevenly (See Fig. 8). Even though the powerset ps11 exhibits deviations in the
DNN, it attains justifiable values for the classifier metrics. The DNN classifier is executed with three hidden units, each containing
100 neurons and training step is set to 1000.

4.1.2. Representational automata generation and visualization of energy data


Representational automata generation is based on the transition of powersets on less-correlated attributes. The experiments on
this transition are shown in Fig. 9, and are illustrated using the effectiveness of the DNN in forming the automata. The histogram
representation in Fig. 9 shows the distribution of non-correlated attributes over the powerset attribute.
The transition of powerset states on less-correlated attributes are distributed among the ITS, ETS, and DTS. Linear-classifier-based
transition accuracy details are shown in Table 7. Fig. 10 shows the representational automata generated based on linear classifiers.
We observe that the linear-classifier-based automata contain fewer transitions to the ETS state as the accuracies of modified powerset
MPS may vary significantly. The states 0, 2 and 9 show a decrease in accuracy on transition with non-correlated attributes. All other
states have at least one transition to ITS with a non-correlated attribute. The state ‘6’ which is the combination of cyclecost, minuptime,
mindowntime and status produces more number of transitions to increased state. This reveals the nature of attribute combinations
in performing the classification problems and also helps to identify the attribute combinations need to be considered in real-time
scenarios to obtain better results on classification tasks.

173
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 11. Visualization of representational automata transitions by DNN classifier.

Table 7
Accuracy of linear classifier based powerset transitions on non-correlated attributes.
Powerset Latitude Lincost Capacity Longitude
ps01 0.64375 0.6875 0.934375 0.528125
ps02 0.690625 0.753125 0.934375 0.50625
ps03 0.934375 0.746875 0.934375 0.69375
ps04 0.934375 0.934375 0.86875 0.934375
ps05 0.934375 0.934375 0.690625 0.925
ps06 0.934375 0.934375 0.825 0.915625
ps07 0.925 0.928125 0.934375 0.459375
ps08 0.934375 0.85625 0.45 0.884375
ps09 0.76875 0.928125 0.934375 0.925
ps10 0.8125 0.76875 0.865625 0.909375
ps11 0.934375 0.896875 0.934375 0.925

Table 8
Accuracy of DNN based powerset transitions on non-correlated attributes.
Powerset Latitude Lincost Capacity Longitude
ps01 0.89375 0.925 0.928125 0.928125
ps02 0.928125 0.928125 0.928125 0.89375
ps03 0.89375 0.925 0.925 0.915625
ps04 0.934375 0.934375 0.934375 0.934375
ps05 0.934375 0.921875 0.91875 0.93125
ps06 0.934375 0.93125 0.921875 0.93125
ps07 0.89375 0.89375 0.928125 0.925
ps08 0.9 0.921875 0.928125 0.928125
ps09 0.928125 0.928125 0.928125 0.915625
ps10 0.934375 0.909375 0.934375 0.93125
ps11 0.915625 0.928125 0.925 0.928125

The DNN-based transition accuracy details are shown in Table 8. In Fig. 11, more transitions to the ETS can be observed, as the
variations in accuracy on the transition with less-correlated attributes remain almost the same. The 11 states are numbered from
0 to 10. Each state is a combination of correlated attributes with class label. The internal structure of the generator_info file can be
viewed from the representational automata. For instance, if we consider state 1, it is the combination of { cyclecost, mindowntime,
status }. Every state contains four transitions represented by edges labeled with the respective less-correlated attribute. The states 2,
10 and 6 of Figs. 10 and 11 show same transitions for both DNN and linear classifiers. The DNN classifier produces more transitions
to ITS than the linear classifier. It reveals that the DNN classifier shows better classification even in the presence of non-correlated
attributes.

174
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Table 9
Description of attributes in bankruptcy dataset.
Si. No Attribute Description
1 RE1 Profit before Tax/Shareholders’ Funds
2 RE2 Net Income/Shareholders’ Funds
3 RE3 EBITDA/Total Assets
4 RE4 EBITDA/Permanent Assets
5 RE5 EBIT/Total Assets
6 RE6 Net Income/Total Assets
7 EF1 Value Added/Total Sales
8 EF2 Total Sales/Shareholders’ Funds
9 EF3 EBIT/Total Sales
10 EF4 Total Sales/Total Assets
11 EF5 Gross Trading Profit/Total Sales
12 EF7 Operating Cash Flow/Total Assets
13 EF8 Operating Cash Flow/Total Sales
14 PR1 Financial Expenses/Total Sales
15 PR2 Labor Expenses/Total Sales

Fig. 12. Accuracy classification metrics of powersets with linear classifier.

4.2. Case study 2

In our second case study, we analyzed the performance of our algorithm over banking data that are related to bankruptcy
prediction. The dataset for bankruptcy prediction contains 40 attributes that are the indicators from several anonymous banking
sector companies from 2002 [47]. The status of each set of attributes belongs to class 1 or class 2. Class 1 describes a healthy set of
attribute values, and Class 2 belongs to the bankrupt group. In the initial experimentation, we selected the first 15 attributes from
the dataset. Table 9 shows the description of the 15 selected attributes. The attributes 2, 3, 4, 5 and 9 are the variants of earnings
before interest, tax, depreciation and amortization (EBITDA) data. All these attributes shows the performance of the company.
Among the attributes, six are correlated and nine are less correlated. In applying Algorithm 3.2, a powerset is formed with the
correlated attributes. Subsequent experimentation is performed over the powerset with the algorithms explained in Section 3.

4.2.1. Implications of FFNN, and linear and DNN classifiers on Bankruptcy data
The accuracy metrics of classification with linear and DNN classifiers are shown in Figs. 12 and 13, respectively. The linear
classifier exhibits more variations in precision and recall compared to the DNN classifier.
The linear classifier in Fig. 12 shows variations in precision and recall, as well as an average accuracy of 85% except for a few
powersets that are within the range of 50 to 100. This shows the influence of less-correlated attributes in the classification accuracy.
The DNN classifier in Fig. 13 shows a similar growth in the metrics by the increased values. The precision and recall exhibits less
difference curing their growth in comparison with the linear classifier.
The FFNN is designed with the input powerset attributes classified into two output classes. Three hidden layers with various
number of neurons in each hidden layer is designed for bankruptcy prediction. Fig. 14 shows the variations in the loss values and
accuracy, with a change in the hidden layer neurons. It is clear that the number of neurons is not always directly proportional to
the accuracy, or inversely proportional to the loss function. An optimal number of neurons exists in the hidden layer after which
the behavior of the network is unpredictable. This is evident from Fig. 14, where the accuracy and loss function change abruptly
after the number of neurons becomes 20.

175
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 13. Accuracy classification metrics of powersets with DNN classifier.

Fig. 14. The influence of neurons on accuracy and loss function of FFNN classifier.

4.2.2. Representational automata generation and visualization of bankruptcy data


The representational automata for bankruptcy data is illustrated in Figs. 15 and 16. Fig. 15 shows the automata generated with
the linear classifier. For the bankruptcy data, 26 powersets exist, each exhibiting transition on nine less-correlated attributes. Fig. 16
shows the representational automata generated with the DNN classifier.
The DNN classifier contains fewer numbers of transitions to the ETS, compared to the similar transition in the linear classifier.
More numbers of transitions to the ITS demonstrate the influence of less-correlated attributes on the normal powerset.

4.3. Comparison of energy and bankruptcy data with standard classifiers

Table 10 summarizes the classification accuracy of the DNN and linear classifier, compared to those of a set of standard classifiers.
Fig. 17 shows the comparison result of each standard classifier on both datasets with the DNN and linear classifier. We considered the
classification accuracy of the modified powersets of each dataset for the comparison process because the automata representation
is generated from modified powerset transitions (see Algorithm 5). The standard classifiers that we considered are KNeighbors,
support vector, decision tree, random forest, multilayer perceptron, Adaboost, Gaussian naive Bayes, and quadratic discriminant
analysis. Random forest and DNN exhibit better classification performances than the others. Random forest classifiers show better
performance with less training data. When the data samples increase, the number of trees also increase which makes the algorithm
computationally expensive. DNN classifier improves the feature learning process with an increase in training samples. If the number
of attributes or features is greater, then DNN classifiers are best suited for learning the impact of attribute combinations.

5. Conclusion and future scope

In this study, we have developed general representational automata using DNN. These automata give a quick summary of the
underlying attribute relations and present the operational characteristics of various attribute combinations by transition diagrams
and new output states, that can be used in other machine learning methods. Representational automata help to reveal the nature

176
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Fig. 15. Representational automata for bankruptcy data generated by linear classifier.

Fig. 16. Representational automata for bankruptcy data generated by DNN classifier.

of attribute relationships by combining correlated and non-correlated attributes. Our method provides a quick selection of attribute
combinations which would favor an increased accuracy in classification. The importance of non-correlated attributes in a dataset
can be viewed from our representational structure through transitions. The three new states of our transition diagram illustrate the
influence of non-correlated attributes on correlated attributes. These transitions can provide different possibilities of improvement
in the design methodologies, fault identification, relevant attribute selection etc.

177
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Table 10
Accuracy comparison of classifiers [48] on energy and bankruptcy data.
Classifier Energy Bankruptcy
KNeighbors 90.73 80.79
SVC 91.5 71.41
DecisionTree 91.83 85.85
RandomForest 93.160 87.81
MLP 69.23 82.96
AdaBoost 82.73 86.34
GaussianNB 89.98 77.53
QDA 75.41 75.01
DNN 93.2 87
Linear 74.2 82

Fig. 17. Classifier accuracy comparison for energy and bankruptcy data.

Representational automata are visual representations originated from dense machine learning concepts. In addition to other
visualization aids such as decision trees and dendrograms, automata diagrams are closer to the behavior of data by following
the traditional steps of learning. Various operational rules can be formed using relevant attribute combinations. Finding out the
most appropriate rules and their validation is challenging. Further implications of automata visualization must be studied on
labeling algorithms, other attribute combination methods, combination of less-correlated attributes with powersets, improvement in
visualization capabilities, and other statistical metrics to identify any unrecognized patterns or information. An additional point of
concern about this novel representational method is regarding the number of attributes. If the number of attributes increases, then
the number of powersets increases exponentially. The time required in such cases will also increase significantly. When the number
of less-correlated attributes increase, it would result an increased transitions in the order of geometric progression. Addressing such
issues is challenging when attempting to form a quick representational structure.

Acknowledgments

This research received funding from the Netherlands Organization for Scientific Research (NWO), The Netherlands in the
framework of the Indo Dutch Science Industry Collaboration Program in relation to project NextGenSmartDC (629.002.102).

References

[1] Regression and Classification. http://katbailey.github.io/post/from-both-sides-now-the-math-of-linear-regression. (Accessed 19 January 2018).


[2] Deep Neural Networks. http://neuralnetworksanddeeplearning.com. (Accessed 14 January 2018).
[3] C. Wen, L. Tao, Parameter analysis of negative selection algorithm, Inform. Sci. 420 (2017) 218–234.
[4] A. Ali, F. Yangyu, Unsupervised feature learning and automatic modulation classification using deep learning model, Phys. Commun. 25 (2017) 75–84.
[5] J. Xu, B. Tang, H. He, H. Man, Semisupervised feature selection based on relevance and redundancy criteria, IEEE Trans. Neural Netw. Learn. Syst. 28
(9) (2017) 1974–1984.
[6] A.K. Jain, R.P.W. Duin, J. Mao, Statistical pattern recognition: a review, IEEE Trans. Pattern Anal. Mach. Intell. 22 (1) (2000) 4–37.
[7] M. Pérez-Ortiz, S. Jiménez-Fernández, P.A. Gutierrez, E. Alexandre, C. Hervás-Martínez, S. Salcedo-Sanz, A review of classification problems and algorithms
in renewable energy applications, Energies 9 (2016) 1–27.
[8] G. Trigeorgis, K. Bousmalis, S. Zafeiriou, B.W. Schuller, A deep matrix factorization method for learning attribute representations, IEEE Trans. Pattern
Anal. Mach. Intell. 39 (3) (2017) 417–429.
[9] S. Benzekry, C. Lamont, A. Beheshti, A. Tracz, J.M.L. Ebos, L. Hlatky, P. Hahnfeldt, Classical mathematical models for description and prediction of
experimental tumor growth, PLoS Comput. Biol. 10 (8) (2014) 1–19.

178
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

[10] V.M. Dalfard, M.N. Asli, S.M. Asadzadeh, S.M. Sajjadi, A. Nazari-Shirkouhi, A mathematical modeling for incorporating energy price hikes into total natural
gas consumption forecasting, Appl. Math. Model. 37 (8) (2013) 5664–5679.
[11] Z. Liu, B. Li, Z. Pei, K. Qin, Formal concept analysis via multi-granulation attributes, in: 2017 12th International Conference on Intelligent Systems and
Knowledge Engineering (ISKE), 2017, pp. 1–6.
[12] Relevant Feature Selection. https://www.datacamp.com. (Accessed 16 February 2018).
[13] C. Qin, S. Song, G. Huang, L. Zhu, Unsupervised neighborhood component analysis for clustering, Neurocomputing 168 (2015) 609–617.
[14] L. Zhou, S. Pan, J. Wang, A.V. Vasilakos, Machine learning on big data: opportunities and challenges, Neurocomputing 237 (2017) 350–361.
[15] B. Gabrys, L. Petrakieva, Combining labelled and unlabelled data in the design of pattern classification systems, in: Integration of Methods and Hybrid
Systems, Internat. J. Approx. Reason. 35 (3) (2004) 251–273.
[16] D. Lam, M. Wei, D. Wunsch, Clustering data of mixed categorical and numerical type with unsupervised feature learning, IEEE Access 3 (2015) 1605–1613.
[17] Deep Learning Neural Networks. http://textminingonline.com/dive-into-tensorflow-part-vi-beyond-deep-learning. (Accessed 13 January 2018).
[18] G. Zhong, L. Wang, J. Dong, An overview on data representation learning: from traditional feature learning to recent deep learning, Finance and Data
Sci. 4 (2016) 265–278.
[19] C.J. Jolliffe I. T, Principal component analysis: a review and recent developments, Philos. Trans. Math. Phys. Eng. Sci. 374 (2016) (2016).
[20] N. Gkalelis, V. Mezaris, Incremental accelerated kernel discriminant analysis, in: Proceedings of the 2017 ACM on Multimedia Conference, MM’17, 2017,
pp. 1575–1583.
[21] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision 60 (2) (2004) 91–110.
[22] M.-R. Bouguelia, S. Pashami, S. Nowaczyk, Multi-task representation learning, in: 30th Annual Workshop of the Swedish Artificial Intelligence Society
SAIS 2017, no. 137, Linkoping University Electronic Press, 2017, pp. 53–59.
[23] G. Feng, H. Li, J. Dong, J. Zhang, Effective classification of 2DPCA and 2DLDA features for face recognition, in: 2017 Chinese Automation Congress (CAC),
2017, pp. 4773–4777.
[24] E. Oja, Applications of independent component analysis, in: Neural Information Processing, Springer Berlin Heidelberg, Berlin, Heidelberg, 2004, pp.
1044–1051.
[25] J.B. Tenenbaum, Mapping a manifold of perceptual observations, in: Advances in Neural Information Processing Systems 10, MIT Press, 1998, pp. 682–688.
[26] J. Krupka, P. Jirava, Rough-fuzzy classifier modeling using data repository sets, Procedia Comput. Sci. 35 (2014) 701–709.
[27] J.F. Sánchez-Rada, C.A. Iglesias, Onyx: a linked data approach to emotion representation, Inf. Process. Manage. 52 (1) (2016) 99–114.
[28] H. Li, J. Zhang, J. Hu, C. Zhang, J. Liu, Graph-based discriminative concept factorization for data representation, Knowl. Based Syst. 118 (C) (2017)
70–79.
[29] H.B. Sebastijan Dumančić, Clustering-based relational unsupervised representation learning with an explicit distributed representation, in: Proceedings of
the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, 2017, pp. 1631–1637.
[30] S. Cao, W. Lu, Q. Xu, Grarep: learning graph representations with global structural information, in: Proceedings of the 24th ACM International on Conference
on Information and Knowledge Management, ACM, 2015, pp. 891–900.
[31] C.H. Lee, An information-theoretic filter approach for value weighted classification learning in naive bayes, Data Knowl. Eng. 113 (2018) 116–128.
[32] J. Luo, H. Yan, Y. Yuan, A novel representation in three-dimensions for high dimensional data sets, Data Knowl. Eng. 117 (2018) 37–52.
[33] A. Esmaeilzehi, H.A. Moghaddam, Nonparametric kernel sparse representation-based classifier, Pattern Recognit. Lett. 89 (2017) 46–52.
[34] F. Huang, X. Zhang, Z. Zhao, Z. Li, Y. He, Deep multi-view representation learning for social images, Appl. Soft Comput. 73 (2018) 106–118.
[35] S. Deepika, T. Geetha, A meta-learning framework using representation learning to predict drug-drug interaction, J. Biomed. Inform. 84 (2018) 136–147.
[36] A. Tavanaei, T. Masquelier, A. Maida, Representation learning using event-based stdp, Neural Netw. 105 (2018) 294–303.
[37] M. Mhiri, S. Abuelwafa, C. Desrosiers, M. Cheriet, Hierarchical representation learning using spherical k-means for segmentation-free word spotting, Pattern
Recognit. Lett. 101 (2018) 52–59.
[38] H. Peng, M. Bao, J. Li, M. Bhuiyan, Y. Liu, Y. He, E. Yang, Incremental term representation learning for social network analysis, Future Gener. Comput.
Syst. 86 (2018) 1503–1512.
[39] M. Alam, L. Vidyaratne, K. Iftekharuddin, Novel deep generative simultaneous recurrent model for efficient representation learning, Neural Netw. 107
(2018) 12–22.
[40] Y. Yuan, G. Xun, Q. Suo, K. Jia, A. Zhang, Wave2vec: deep representation learning for clinical temporal data, Neurocomputing 324 (2019) 31–42.
[41] T. Lesort, N. Díaz-Rodríguez, J.-F. Goudou, D. Filliat, State representation learning for control: an overview, Neural Netw. 108 (2018) 379–392.
[42] F. Anselmi, G. Evangelopoulos, L. Rosasco, T. Poggio, Symmetry-adapted representation learning, Pattern Recognit. 86 (2019) 201–208.
[43] W. Xiong, Z. Lu, B. Li, B. Hang, Z. Wu, Automating smart recommendation from natural language api descriptions via representation learning, Future
Gener. Comput. Syst. 87 (2018) 382–391.
[44] Powerset. https://www.mathsisfun.com/sets/powerset.html. (Accessed 27 May 2018).
[45] Categorical attribute and class labels. https://towardsdatascience.com/understanding-feature-engineering-part-2-categorical-data-f54324193e63. (Accessed
28 March 2018).
[46] Energy dataset. https://zenodo.org/record/999150. (Accessed 20 January 2018).
[47] Bankruptcy dataset. http://research.cs.aalto.fi/aml/datasets/financialratios.data. (Accessed 19 March 2018).
[48] Classifiers. https://scikit-learn.org/stable/supervised_learning.html#supervised-learning. (Accessed 23 April 2018).

Johnpaul C. I received a B-Tech in Computer Science and Engineering from Calicut University, India in 2010. He got M-Tech from
Amrita Vishwavidyapeetham, Coimbatore, India in 2013. He worked for two years as a software engineer at Cognizant Technologies,
Bengaluru and two years as an Assistant Professor at National Institute of Engineering, Mysore, India. He is currently a doctoral
candidate in the Department of Computer Applications at National Institute of Technology, Tiruchirappalli, India and Institute for
Development & Research in Banking Technology (IDRBT), Hyderabad, India respectively. His current research interests include data
mining, machine learning and energy informatics.

179
Johnpaul C.I., M.V.N.K. Prasad, S. Nickolas et al. Data & Knowledge Engineering 122 (2019) 159–180

Munaga V N K Prasad received his doctoral degree from the Institute of Technology, Banaras Hindu University, India. Currently, he
is an Associate Professor at the Institute for Development and Research in Banking Technology, Hyderabad, India. He has published
a number of papers in international and national journals and conferences. His research interests include biometrics, payment system
technologies and data hiding. He is a senior member of IEEE and ACM.

S. Nickolas received M.E and PhD degree in Computer Science from National Institute of Technology (NITT), Tiruchirappalli, India
in 1992 and 2007 respectively. He joined the Department of Computer Applications at NITT in 1988 as a faculty member. Currently,
he is working as a Professor in the Department of Computer Applications at NITT and Professor in charge of the CUDA-NVIDIA Lab
of the Institute. He has published 20 research articles in the international conferences and journals. His research areas include data
mining, big data analysis, distributed computing, cloud computing and software metrics. He is a member of professional bodies like
ACM, ISTE and IE.

G. R. Gangadharan is an Associate professor at National Institute of Technology, Tiruchirappalli, India. His research interests focus
on the interface between technological and business perspectives. Gangadharan received his PhD in information and communication
technology from the University of Trento, Italy, and the European University Association. He is a senior member of IEEE and ACM.
Contact him at geeyaar@gmail.com.

180

You might also like