Professional Documents
Culture Documents
Textbook Computational Intelligence Theories Applications and Future Directions Volume I Nishchal K Verma Ebook All Chapter PDF
Textbook Computational Intelligence Theories Applications and Future Directions Volume I Nishchal K Verma Ebook All Chapter PDF
https://textbookfull.com/product/computational-intelligence-
theories-applications-and-future-directions-volume-ii-
icci-2017-nishchal-k-verma/
https://textbookfull.com/product/computational-intelligence-and-
its-applications-abdelmalek-amine/
https://textbookfull.com/product/sports-exercise-and-nutritional-
genomics-current-status-and-future-directions-1st-edition-ildus-
i-ahmetov/
https://textbookfull.com/product/cognitive-work-analysis-
applications-extensions-and-future-directions-1st-edition-
neville-a-stanton/
Computational Intelligence Applications in Business
Intelligence and Big Data Analytics 1st Edition Vijayan
Sugumaran
https://textbookfull.com/product/computational-intelligence-
applications-in-business-intelligence-and-big-data-analytics-1st-
edition-vijayan-sugumaran/
https://textbookfull.com/product/structural-dynamics-
fundamentals-and-advanced-applications-volume-i-volume-i-1st-
edition-alvar-m-kabe/
https://textbookfull.com/product/defense-against-biological-
attacks-volume-i-sunit-k-singh/
https://textbookfull.com/product/human-resource-information-
systems-basics-applications-and-future-directions-4th-edition-
michael-j-kavanagh-editor/
https://textbookfull.com/product/qualitative-methodologies-in-
organization-studies-volume-i-theories-and-new-approaches-1st-
edition-malgorzata-ciesielska/
Advances in Intelligent Systems and Computing 798
Computational
Intelligence: Theories,
Applications and
Future Directions—
Volume I
ICCI-2017
Advances in Intelligent Systems and Computing
Volume 798
Series editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
e-mail: kacprzyk@ibspan.waw.pl
The series “Advances in Intelligent Systems and Computing” contains publications on theory,
applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all
disciplines such as engineering, natural sciences, computer and information science, ICT, economics,
business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the
areas of modern intelligent systems and computing such as: computational intelligence, soft computing
including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms,
social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and
society, cognitive science and systems, Perception and Vision, DNA and immune based systems,
self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics including
human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent
data analysis, knowledge management, intelligent agents, intelligent decision making and support,
intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings
of important conferences, symposia and congresses. They cover significant recent developments in the
field, both of a foundational and applicable character. An important characteristic feature of the series is
the short publication time and world-wide distribution. This permits a rapid and broad dissemination of
research results.
Advisory Board
Chairman
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
e-mail: nikhil@isical.ac.in
Members
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central “Marta
Abreu” de Las Villas, Santa Clara, Cuba
e-mail: rbellop@uclv.edu.cu
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
e-mail: escorchado@usal.es
Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester,
UK
e-mail: hani@essex.ac.uk
László T. Kóczy, Department of Automation, Széchenyi István University, Győr, Hungary
e-mail: koczy@sze.hu
Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA
e-mail: vladik@utep.edu
Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan
e-mail: ctlin@mail.nctu.edu.tw
Jie Lu, Faculty of Engineering and Information, University of Technology, Sydney, NSW, Australia
e-mail: Jie.Lu@uts.edu.au
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana,
Mexico
e-mail: epmelin@hafsamx.org
Nadia Nedjah, Department of Electronics Engineering, State University of Rio de Janeiro, Rio de Janeiro,
Brazil
e-mail: nadia@eng.uerj.br
Ngoc Thanh Nguyen, Wrocław University of Technology, Wrocław, Poland
e-mail: Ngoc-Thanh.Nguyen@pwr.edu.pl
Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong
Kong, Shatin, Hong Kong
e-mail: jwang@mae.cuhk.edu.hk
Editors
Computational Intelligence:
Theories, Applications
and Future Directions—
Volume I
ICCI-2017
123
Editors
Nishchal K. Verma A. K. Ghosh
Department of Electrical Engineering Department of Aerospace Engineering
Indian Institute of Technology Kanpur Indian Institute of Technology Kanpur
Kanpur, Uttar Pradesh Kanpur, Uttar Pradesh
India India
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
v
Contents
Part II Bioinformatics
High-Dimensional Data Classification Using PSO
and Bat Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Viplove Divyasheesh and Anil Pandey
Feature Learning Using Stacked Autoencoder for Shared
and Multimodal Fusion of Medical Images . . . . . . . . . . . . . . . . . . . . . . 53
Vikas Singh, Nishchal K. Verma, Zeeshan Ul Islam and Yan Cui
A New Computational Approach to Identify Essential Genes
in Bacterial Organisms Using Machine Learning . . . . . . . . . . . . . . . . . . 67
Ankur Singhal, Devasheesh Roy, Somit Mittal, Joydip Dhar
and Anuraj Singh
Automatic ECG Signals Recognition Based on Time Domain
Features Extraction Using Fiducial Mean Square Algorithm . . . . . . . . . 81
V. Vijendra and Meghana Kulkarni
vii
viii Contents
xiii
Part I
Big Data Analytics
Analysis of Weather Data Using
Forecasting Algorithms
Abstract Predictive analytics is the current focus not only on business applications
but also it emerges in all types of applications which involves in the prediction of
future outcomes. This results in the development of various prediction algorithms
under the domain of machine learning, data mining, and forecasting. This paper
focuses on analysis of the data pattern and its behavior using univariate forecasting
model. Temperature is taken as the univariate observation from weather dataset, and
the forecast value is predicted using forecasting algorithms. The predicted forecast
value is compared with real-time data from which it is observed that level component
plays a major role than trend and seasonal component in real-time data, and the
predicted forecast value does not depend on size of the dataset.
1 Introduction
Data collected from various sources like sensors, satellite, social networks, online
transactions, etc. are mostly of unstructured and semi-structured form which are
nowadays termed as big data. These data are not stored in rows and columns with
defined data type in data warehouse due to unstructured in nature. Hence, scalable
racks of disks with parallel and distributed computing in a high processing environ-
ment called hyperscale computing environment is set up by Google, Apple, Facebook,
etc., to handle such data. The received data is stored as raw data itself in disks with
minor preprocessing like denoising to remove noise, and feature extraction obtains
most relevant data by omitting irrelevant data. To undergo analytics like predictive and
prescriptive analytics on big data, these preprocessing steps form the basis to obtain
good results with satisfactory level of accuracy. The basic concept of big data and
various analytics types are explained in brief in our previously published paper [1].
Predictive analytics deals with determining the forecast values based on its previ-
ous observations. Many prediction algorithms like data mining algorithms, statistical
methods, forecasting methods and time series methods are used in real time to pre-
dict the future occurrences in the applications like share market in business, students
performance in education, players attempts in sports, weather predictions, etc. Some
of the forecast models are as follows:
(i) Simple mean: The average value of previous observations is taken as forecast
value in this method. Simple mean can be applied to datasets that have a con-
trolled pattern which does not deviate much higher than a certain level called
outliers [2]. Since all the observations are taken into consideration for calculat-
ing mean, even one value in a dataset deviation may affect the final mean value.
It is given by the following formula:
1
n
Fn+1 Yi
n i1
1
n
Fn+1 Yi
k in−k+1
Fn+1 Fn + α(Yn − Fn )
the α value leads to substantial increase in the control on level and forecast error.
The α value can also be made adaptive by instead of considering fixed value, it made
be changed with respect to the change in the pattern of data.
(iv) Holt’s linear method: This method includes finding the forecast value by con-
sidering the level and trend of the data pattern. Thus, it includes two smoothing
constants α and β to have control over level and trend of the data [5]. The
formula for Holt’s linear method is given by
Fn+m L n + bn m
where L n is the level of the series forn observations, bn is the trend of the series, and
m is the number of periods to forecast ahead.
(v) Holt-Winters method: It predicts the forecast value through seasonal component
also along with level and trend [6, 7]. Hence, it uses three constants α, β, and γ
for level, trend, and seasonal adjustments. The formula for Holt-Winters method
is given by
Fn+m (L n + bn m)Sn−s+m
2 Implementation
The dataset used for this analysis is obtained from Central Research Institute for Dry-
land Agriculture (CRIDA) which is a National Research Institute under the Indian
Council of Agricultural Research (ICAR). This dataset is large enough that con-
tain weather data from the year 1958–2014, nearly 56 years of data with 20,820
rows (days) and attributes like temperature, humidity, wind speed, sunshine, etc.
The temperature attribute is taken for analysis, and it was undergone for predicting
the forecast value and compared with the available real-time data, i.e., the average
temperature from the year 1958–2013 is used as input for the forecasting algorithms
and predicts the temperature of the year 2014. Real-time temperature (in Celsius)
of the year 2014 is already available in the dataset and hence it was compared with
the value predicted by the algorithms and the percentage of accuracy is determined.
The data is collected from sensors so it is not purely structural data, and hence it was
converted to comma-separated file (CSV) for easy access. The implementation is
done using R programming which contains various packages for statistical analysis
and techniques. The dataset is undergone for few steps of preprocessing to apply the
data in R functions. The preprocessing details are given below:
6 S. Poornima et al.
(i) The average temperature was read in a variable as a dataframe. This dataframe
is converted to DATE type to make time series conversion easier.
(ii) Convert this dataframe to zoo object by using zoo() function in zoo package (to
be downloaded and installed).
iii) The time series object can be obtained using ts() function in zoo package.
The reason to convert the dataframe to zoo object followed by time series object
without converting directly to time series is that compatibility errors may occur
in direct conversion because of the frequency exceeding beyond 12. This time
series object can be undergone with forecast predictions using forecast functions
like rollmean() for moving average, HoltWinters(), etc. The forecast functions are
built-in functions in R which contains several parameters that can be set up as per
the need of the research. In our implementation, every 18 years data are separately
analyzed as individual phase to study the behavior of the data pattern at each stage
and the entire dataset temperature value is taken for prediction as the last phase.
It is very difficult to analyze the real-time dataset since always the real-time dataset
values will not follow any particular pattern or range. It may be within certain level
for some interval of time or it may have abrupt change at certain period. In this paper,
as it was mentioned in implementation part, the results of various phases are given
in Table 1.
3.1 Phase I
The results in the table clearly show that the accuracy is less for the data pattern
that lies between the year 1958–1975, which means that the range of celsius value
of the temperature has much variation or dispersion with the real-time data in the
data points such that whose difference between the minimum and maximum value
is high compared to other phases. The minimum temperature in this phase is 8.6
and maximum temperature is 38.9 °C, respectively, which has high difference in
temperature fluctuation compared to other phases, which forms the maximum and
minimum temperature value for the entire dataset also. This phase gains high accuracy
for simple mean method than other phases as shown in Fig. 1, since the central
tendency (26.47 °C) of this method is merely near to real-time temperature (24.15 °C).
The reason behind this mean value is calculated by considering all data points which
includes the degree of diversity in that phase and the data points do not follow level,
trend, or seasonal pattern. All the three components show almost same range of
accuracy that converge to low value (maximum 0.631 for level) compared to other
Analysis of Weather Data Using Forecasting Algorithms 7
Table 1 The results of forecasting algorithms applied with constant values (α, β, γ ) used, the
comparison of actual value with predicted value, and accuracy level in percentage
Algorithm Actual Alpha (α) Beta (β) Gamma (γ ) Accuracy (%)
applied value/predicted fit
Year 1958–2013 (complete dataset)
Simple mean 19.85/26.67 NA NA NA 65.6
Moving 19.85/20.83 NA NA NA 90.26
averages
(order 3 days)
Holt-Winters 19.85/18.26 0.7428 NA NA 92.00
(only level)
Holt-Winters 19.85/18.23 0.7727 0.01573 False 91.87
(level and trend)
Holt-Winters 19.85/18.25 0.262 False 0.0550 91.96
(level and
seasonal)
Holt-Winters 19.85/17.78 0.262 0.01558 0.0492 89.6
(level, trend,
and seasonal)
Year 1958–1975 (Phase I)
Simple mean 24.15/26.47 NA NA NA 98.09
Moving 24.15/18.68 NA NA NA 77.36
averages
(order 3 days)
Holt-Winters 24.15/18.63 0.6317 NA NA 77.14
(only level)
Holt-Winters 24.15/18.61 0.6911 0.03936 False 77.06
(level and trend)
Holt-Winters 24.15/18.86 0.5953 False 0.1372 78.13
(level and
seasonal)
Holt-Winters 24.15/18.17 0.6031 0.00363 0.1389 75.31
(level, trend,
and seasonal)
Year 1976–1994 (Phase II)
Simple mean 17.55/26.61 NA NA NA 48.33
Moving 17.55/19.47 NA NA NA 89.04
averages
(order 3 days)
Holt-Winters 17.55/17.70 0.8279 NA NA 94.25
(only level)
(continued)
8 S. Poornima et al.
Table 1 (continued)
Algorithm Actual Alpha (α) Beta (β) Gamma (γ ) Accuracy (%)
applied value/predicted fit
Holt-Winters 17.55/17.68 0.8343 0.00473 False 99.22
(level and trend)
Holt-Winters 17.55/17.83 0.7657 False 0.2563 94.21
(level and
seasonal)
Holt-Winters 17.55/17.17 0.7675 0.00164 0.2230 97.84
(level, trend,
and seasonal)
Year 1995–2013 (Phase III)
Simple mean 19.85/26.87 NA NA NA 64.59
Moving 19.85/19.6 NA NA NA 98.74
averages
(order 3 days)
Holt-Winters 19.85/19.31 0.8055 NA NA 97.32
(only level)
Holt-Winters 19.85/19.29 0.8192 0.01235 False 97.19
(level and trend)
Holt-Winters 19.85/19.51 0.7448 False 0.2563 98.29
(level and
seasonal)
Holt-Winters 19.85/19.16 0.7376 0.00079 0.2445 96.53
(level, trend,
and seasonal)
Fig. 1 The plot for the comparison of accuracy (%) of algorithms used as mentioned in Table 1
phases; hence, all the three components do not have more influence on the data points.
Thus, the accuracy of the algorithms is low in that phase except for simple mean.
Analysis of Weather Data Using Forecasting Algorithms 9
3.2 Phase II
In this phase, the data between the year 1976–1994 is taken for analysis whose
accuracy levels are highly better (maximum 99.2%) compared to phases I and III
which does not significantly vary. Except simple mean and moving average method,
all the other four methods result in acceptable level of accuracy in phase II as shown
in Fig. 1, whereas in phase III all methods give good accuracy except simple mean.
Data points follow certain level, trend, and seasonal component in this phase but
level component has more influence because the converging value of α is high than
the other two. The level lies between 12.75 °C as minimum value and 37.45 °C
as maximum value whose difference is less compared to previous phase and thus
the accuracy rises. When the level component alone is applied, it gives 94.25% of
accuracy, but when trend component is also applied along with level it is increased
to 99.2% which shows that a high contribution is on level and trend has very less
contribution compared to level, that is, only 3% of accuracy is raised by trend where
both components are added in the formula. On the other hand, applying level and
seasonal component gives almost same percentage of accuracy 94.21% as that of
applying level only which shows that the seasonal component does not have any
effect in this set of data points. Finally, when applying all the three components,
the accuracy is 97.8% whose value increases by adding trend component along with
level and seasonal component. Thus, we conclude that the data series have more
fluctuation and hence α is converged to high value around 0.8 that controls the series
to predict better forecast value. The contribution of trend is considerable since even
the data points continuously do not increase or decrease but it follows the trend for
very short interval or period that can be viewed in the graph, and hence β value always
has low convergence. At last, the seasonal component always lies around 0.2 which
shows that seasonality adjustment is high compared to trend but very less compared
to level, from which we understand that the series has very short-term seasonality
which can be almost considered as unseasonal for a longer period of time.
This phase is the analysis of data from 1995 to 2013, which is almost same as phase II
whose accuracy values differ from 1 to 4% only. All the methods give high accuracy
except simple mean method, and the reason might be that the data series was applied
with high control using the constants. Regarding the forecasting components, the
discussion is same as phase II but the series has slightly improved seasonal cycle
period than the other phases which can be understood from γ value that converges
around 0.25 and obtains good accuracy by enabling the seasonal component.
On analyzing the entire data series of 56 years, the accuracy value increased than
phase I but decreased than phases II and III as shown in Fig. 1. The reduced accuracy
than phases II and III is due to data points of phase I which has highly dispersed and
10 S. Poornima et al.
irregular. So it is understood that forecast accuracy of this dataset does not depend on
size, since phase I accuracy is very less compared to entire dataset. Another important
issue to be noted is, on implementing the forecasting algorithm in R programming,
the constants such as α, β, and γ values are automatically applied by R based on the
data series. But on executing these algorithms with trial and error, i.e., by applying the
constant values from 0.1 to 0.9 manually, there exists a small difference in forecast
value so that in accuracy too. For example, in Table 1, on applying the Holt-Winters
method (with level and trend), the obtained forecast value is 18.237 °C with accuracy
of 91.87 for the α value converged at 0.7727 by R programming, whereas on trial
and error, α converges to 0.9999 that leads to 18.399 °C whose accuracy increased to
92.3. It is also found that there is no difference in β value taken by R and manually
applied value which may also occur in some cases. The difference in accuracy is just
0.5% for α which is a negligible value but this may not be the same for all data series
of all datasets. Since the change in constant value leads to change in forecast value,
it is advisable to apply trial and error for research purpose, whereas it is difficult to
apply in real-time business applications since it may involve a lot of parameters and
large digits of numbers.
4 Conclusion
References
1. Poornima, S., Pushpalatha, M.: A journey from big data towards prescriptive analytics. ARPN
J. Eng. Appl Sci 11(19), 11465–11474 (2016)
2. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2),
85–126 (2004)
3. Shih, S.H., Tsokos, C.P.: A weighted moving average process for forecasting. J. Mod. Appl.
Stat. Methods 7(1), 187–197 (2008)
4. Ostertagova, E., Ostertag, O.: Forecasting using simple exponential smoothing method. Acta
Electrotechnica et Informatica 12(3), 62–66 (2012)
Analysis of Weather Data Using Forecasting Algorithms 11
5. Hyndman, R.J., Khandakar, Y.: Automatic time series forecasting: the forecast package for R.
J. Stat. Softw. 27(3), 1–22 (2008)
6. Newbernem, J.H.: Holt-Winters forecasting: a study of practical applications for healthcare.
Graduate program report in healthcare administration, Army-Baylor University, pp. 1–36 (2006)
7. Kalekar, P.S.: Time series forecasting using holt-winters exponential smoothing. Kanwal Rekhi
School of Information Technology, pp. 1–13 (2004)
8. Crida Homepage, http://www.crida.in/. Last accessed 25 May 2017
K-Data Depth Based Clustering
Algorithm
Abstract This paper proposes a new data clustering algorithm based on data depth.
In the proposed algorithm the centroids of the K-clusters are calculated using Maha-
lanobis data depth method. The performance of the algorithm called K-Data Depth
Based Clustering Algorithm (K-DBCA) is evaluated in R using datasets defined
in the mlbench package of R and from UCI Machine Learning Repository, yields
good clustering results and is robust to outliers. In addition, it is invariant to affine
transformations and it is also tested for face recognition which yields better accuracy.
1 Introduction
Cluster analysis or Clustering is a data analysis tool which explores the data instances
into groups (clusters) in such a way that the similarities between the instances is max-
imal within the cluster and is minimal outside the cluster. There are many applications
of clustering, including Customer segmentation [1], image recognition [2], genetic
sequencing [3] or human mobility patterns [4]. A lot of research has been done in
clustering and many classification of clustering algorithms are found in the litera-
ture namely Hierarchical clustering [5], Density clustering [6], Grid clustering [7]
and Partitioning clustering [8]. The most commonly used clustering algorithm in
data mining is K-Means clustering algorithm [9]which is based on the principle of
squared error [9, 10]. This paper proposes a new algorithm based on Data Depth
and we named it as K-data depth based clustering algorithm (K-DBCA). Data depth
gives the deepness of an instance in a dataset [11] which is shown in Fig. 1 using the
Mahalanobis depth over iris dataset.
In Sects. 2 and 3 the details of data depth and the proposed method are given.
Sections 4 and 4.1 presents comparative outcomes of the proposed method with K-
Means with regard to accurateness using data sets defined in mlbench package of R
and from UCI Machine Learning repository. Section 4.2 demonstrates experimental
results of the method with regard to invariance to affine transformations and Sect. 4.3
presents comparative experimental clustering results of the algorithm with K-Means
in terms of robustness to outliers. In Sect. 4.4 we tested our algorithm on face recog-
nition and finally in Sect. 5 we have given the conclusion of the proposed method
with limitations and future enhancement.
2 Data Depth
Data depth gives the centrality of an object in a data cloud and it is an excellent
tool for multivariate data analysis. Without making prior assumptions about the
probability distributions of a dataset, we can analyze, quantify and visualize it using
data depth. Data depth assigns a value between 0 and 1 to each data point in the
dataset which specifies the centrality or deepness of that point in the dataset. The
point with maximum depth will be the deepest point in the dataset. Various data
depth methods are found in the literature, examples include convex-hull peeling
depth [12, 13], half-space depth [14, 15], simplicial depth [16], regression depth
[17, 18] and L1 depth [19]. Data depth has a very powerful future as a data analytic
tool, but due to the shortfall of powerful tools for big data analytics using depth, it
is not widely used. In order that a depth function serve most effectively as a tool
providing a deepest-outlyingness order of instances of a dataset, it should satisfy the
K-Data Depth Based Clustering Algorithm 15
where ȳ and Y are the mean and covariance matrix of Yn respectively. Maximum
depth point is a center point, higher depth value points are near the center and the
lower depth value points are outliers. Since the mean is sensitive to outliers Eq. 1 is
modified as below:
In this equation point Yi is used in place of mean. Using this equation depth of each
point within a data cloud can be calculated.
3 Algorithm
Some of the concepts required for the formulation of the proposed algorithm are
defined below:
Steps:
1. The algorithm divides n data points into k subsets. Then the initial center-points
for each subset are calculated. [Algorithm 1 line no. 4–10]
2. The algorithm computes the neighbors of each cluster. [Algorithm 1 line no. 13]
3. Then the algorithm computes the updated center-points for each cluster. [Algo-
rithm 1 line no. 14–15]
4. The algorithm compares the previous cluster center-points (CP1) and the updated
cluster center-points (CP2), if CP1 = CP2 then the previous center-points (CP1)
are updated with new center-points (CP2) and the loop continues from step 12,
else the loop terminates. [Algorithm 1 line no. 16–19]
16 I. Baidari and C. Patil
4 Results
The proposed method is evaluated for three features: accuracy, invariance to affine
transformations and robustness to outliers and it is also tested for face recognition.
The K-DBCA algorithm is evaluated in R using the clustering datasets from mlbench
[22] and UCI machine learning repository. Figure 3a represents the shapes dataset
from mlbench package of R. The data set is composed of a Gaussian, square, triangle
and wave in two dimensions consisting of 4000 instances and 3 attributes. Among
the three attributes one attribute is the class attribute which we have taken as “ground
truth”. Few datasets were made based on the original data set for testing invariance
to affine transformations like rotation, scaling, or translation. Figure 4a represents
the dataset which is rotated by 45◦ in counterclockwise direction. For testing robust-
ness to outliers we considered the datasets with outliers from UCI machine learning
repository including, wine, stampout and pen. Adjusted Rand Index (ARI) [23] was
used to examine the clustering outcome of the proposed method with the ground truth
information of the original data. The ARI range is between –1 to 1, where 1 indicates
a perfect matching while 0 and negative values indicate poor matching. How much
data is misplaced during the clustering by a method can be analyzed by Variation of
K-Data Depth Based Clustering Algorithm 17
Information (VI) [24]. The VI index is 0 when result obtained by a method is same
as the ground truth. Here the proposed method’s outcomes are examined with the
ground truth information using ARI and VI.
The first experiment investigated the accurateness of the proposed method on the
basis of ground truth information. The algorithm is evaluated on different datasets
from UCI Machine learning repository, including Iris, Seed, Breast Cancer datasets
and from mlbench, smily and shapes datasets were considered. The details of these
datasets and the accuracies obtained using the K-DBCA algorithm is given in Table 1.
The clustering outcome was examined with the “ground truth” information of the
original dataset using ARI and VI index as shown in Fig. 2a, b. Figure 3a shows one
of the original dataset and Fig. 3b represents the clustering result of the algorithm on
the same dataset shown in Fig. 3a.
The performance of the proposed algorithm with respect to invariance to affine trans-
formations like rotation, scaling and translation was tested by making few more
datasets of the original dataset and transformations were carried out on these new
datasets. Figure 4a shows one of the invariance to affine transformation example in
which the original dataset has been rotated by counterclockwise 45◦ and Fig. 4b
shows the clustering result of the K-DBCA algorithm for the same affine invariant
example. Figure 4c shows an original image that has been rotated by clockwise 45◦
and Fig. 4d shows the clustering result of the K-DBCA algorithm for the same rotat-
ed image. As expected from Fig. 4b, d, the K-DBCA algorithm generated coherent
clusters, irrespective of affine transformation.
For the affine invariant tests clustering results were compared with the “ground
truth” information of the original dataset using ARI and VI index as shown in
Fig. 5a, b.
The performance of the proposed algorithm with respect to invariance to affine
transformations was also tested with real dataset. For real dataset we took a UAV
image from OpenDroneMap as shown in Fig. 6a. The image was rotated by 10◦
clockwise and scaled by 0.8 × X-axis as shown in Fig. 6b. The features from these
images were extracted using AKAZE feature detection [25] method as shown in
Fig. 6c, d. Then the algorithm was applied on these features. The clustering results
are shown in Fig. 7.
In this experiment we tested the robustness of the algorithm to outliers. The algorithm
is evaluated on different datasets with outliers from UCI Machine learning repository,
K-Data Depth Based Clustering Algorithm 19
It was the night before the funeral, and Sir Ronald sat in his study
alone. His servants spoke of him in lowered voices, for since the
terrible day of the murder the master of Aldenmere had hardly tasted
food. More than once he had rung the bell, and, when it was
answered, with white lips and stone-cold face, he had asked for a
tumbler of brandy.
It was past ten o’clock now, and the silent gloom seemed to
gather in intensity, when suddenly there came a fierce ring at the hall
door, so fierce, so imperative, so vehement that one and all the
frightened servants sprang up, and the old housekeeper, with folded
hands, prayed, “Lord have mercy on us!”
Two of the men went, wondering who it was, and what was
wanted.
“Not a very decent way to ring, with one lying dead in the house,”
said one to the other; but, even before they reached the hall door, it
was repeated more imperatively than before.
They opened it quickly. There stood a gentleman who had
evidently ridden hard, for his horse was covered with foam; he had
dismounted in order to ring.
“Is this horrible, accursed story true?” he asked, in a loud, ringing
voice. “Is Lady Alden dead?”
“It is quite true, sir,” replied one of the men, quick to recognize the
true aristocrat.
“Where is Sir Ronald?” he asked, quickly.
“He cannot see any one.”
“Nonsense!” interrupted the stranger, “he must see me; I insist
upon seeing him. Take my card and tell him I am waiting. You send a
groom to attend to my horse; I have ridden hard.”
Both obeyed him, and the gentleman sat down in the entrance
hall while the card was taken to Sir Ronald. The servant rapped
many times, but no answer came; at length he opened the door.
There sat Sir Ronald, just as he had done the night before—his head
bent, his eyes closed, his face bearing most terrible marks of
suffering.
The man went up to him gently.
“Sir Ronald,” he asked, “will you pardon me? The gentleman who
brought this card insists upon seeing you, and will not leave the
house until he has done so. I would not have intruded, Sir Ronald,
but we thought perhaps it might be important.”
Sir Ronald took the card and looked at the name. As he did so a
red flush covered his pale face, and his lips trembled.
“I will see him,” he said, in a faint, hoarse voice.
“May I bring you some wine or brandy, Sir Ronald?” asked the
man.
“No, nothing. Ask Mr. Eyrle to come here.”
He stood quite still until the stranger entered the room; then he
raised his haggard face, and the two men looked at each other.
“You have suffered,” said Kenelm Eyrle; “I can see that. I never
thought to meet you thus, Sir Ronald.”
“No,” said the faint voice.
“We both loved her. You won her, and she sent me away. But, by
heaven! if she had been mine, I would have taken better care of her
than you have done.”
“I did not fail in care or kindness,” was the meek reply.
“Perhaps I am harsh,” he said, more gently. “You look very ill, Sir
Ronald; forgive me if I am abrupt; my heart is broken with this terrible
story.”
“Do you think it is less terrible for me?” said Sir Ronald, with a
sick shudder. “Do you understand how awful even the word murder
is?”
“Yes; it is because I understand so well that I am here. Ronald,”
he added, “there has been ill feeling between us since you won the
prize I would have died for. We were like brothers when we were
boys; even now, if you were prosperous and happy, as I have seen
you in my dreams, I would shun, avoid and hate you, if I could.”
His voice grew sweet and musical with the deep feelings stirred
in his heart.
“Now that you are in trouble that few men know; now that the
bitterest blow the hand of fate can give has fallen on you, let me be
your true friend, comrade and brother again.”
He held out his hand and clasped the cold, unyielding one of his
friend.
“I will help you as far as one man can help another, Ronald. We
will bury the old feud and forget everything except that we have a
wrong to avenge, a crime to punish, a murderer to bring to justice!”
“You are very good to me, Kenelm,” said the broken voice; “you
see that I have hardly any strength or energy.”
“I have plenty,” said Kenelm Eyrle, “and it shall be used for one
purpose. Ronald, will you let me see her? She is to be buried to-
morrow—the fairest face the sun ever shone on will be taken away
forever. Let me see her; do not refuse me. For the memory of the
boy’s love so strong between us once—for the memory of the man’s
love and the man’s sorrow that has laid my life bare and waste, let
me see her, Ronald?”
“I will go with you,” said Sir Ronald Alden; and, for the first time
since the tragedy in its full horror had been known to him, Sir Ronald
left the library and went to the room where his dead wife lay.
CHAPTER V.
WHICH LOVED HER BEST?