You are on page 1of 8

2016 IEEE International Conference on Systems, Man, and Cybernetics' SMC 20161 October 9-12,2016' Budapest, Hungary

Empirical Data Analysis


A New Tool for Data Analyties
Plamen Angelov, Fellow, IEEE, Jose Principe, Fellow, IEEE
Xiaowei Gu, and Dmitry Kangin Computational Neuro Engineering Laboratory,
Data Science Group, Department of Electrical and Computer Engineering,
School of Computing and Communications, University of Florida, Grainsville, FL, USA,
Lancaster University, Lancaster, UK E-mail: principe@cnel.utl.edu
E-mail: p.angelov@lancaster.ac.uk

Abstract-In this paper, a novel empirical data analysis sets or streams and converting the data into useful
approach (abbreviated as EDA) is introduced which is entirely in formati on.
data-driven and free from restricting assumptions and pre-
defined problem- or user-specific parameters and thresholds. It is Traditional approaches are based on a number of
well known that the traditional probability theory is restricted by restrictive asswnptions which usually do not hold in reality.
strong prior assumptions which are often impractical and do not This applies to the fuzzy sets theory [I] with its subjective

on
hold in real problems. Machine learning methods, on the other way of defining membership functions, assumption of smooth
hand, are closer to the real problems but they usually rely on and pre-defmed membership functions. It also applies to the
problem- or user-specific parameters or thresholds making it probability theory [2] and statistical learning [3], [10]. They
rather art than science. In this paper we introduce a theoretically are an essential and widely used tool for quantitative analysis
sound yet practically unrestricted and widely applicable of data representation of stochastic type of uncertainties.
approach that is based on the density in the data space. Since the However, they rely on a nwnber of strong assumptions
data may have exactly the same value multiple times we including pre-defined smooth and "convenient to use" types of
distinguish between the data points and unique locations in the probability distribution, infinite amount of observations/data

si
data space. The number of data points, k is larger or equal to the points, independence between data points (so called iid -
number of unique locations, I and at least one data point occupies independent and identically distributed data), etc. However, in
each unique location. The number of different data points that
most practical problems, these assumptions are not satisfied.
have exactly the same location in the data space (equal value), f
Till now, several alternative methods [5], [6] have been
can be seen as frequency. Through the combination of the spatial
er
density and the frequency of occurrence of discrete data points, a
proposed aiming to avoid the problem of unrealistic prior-
new concept called multimodal typicality, t'fM is proposed in this assumptions and get closer to the data rather than stick to the
paper. It offers a closed analytical form that represents ensemble theoretical prior assumptions, but these methods still use at
properties derived entirely from the empirical observations of some point asswnption of (albeit local) Gaussian/normal
data. Moreover, it is very close (yet different) from the distribution.
histograms, from the probability density function (pdf) as well as
lV
In this paper, we introduce a novel empirical data analysis
from fuzzy set membership functions. Remarkably, there is no
(EDA) approach. It is a further development of the recently
need to perform complicated pre-processing like clustering to get
introduced TEDA (typicality and eccentricity data analytics)
the multimodal representation. Moreover, the closed form for the
case of Euclidean, Mahalanobis type of distance as well as some
framework [7]-[9]. It does not require any prior assumptions
other forms (e.g. cosine-based dissimilarity) can be expressed which are usually unrealistic and restrictive, or parameters.
recursively making it applicable to data streams and online Instead, it is entirely based on the empirical observations of
algorithms. Inference/estimation of the typicality of data points discrete data points and their mutual position fonning a unique
that were not present in the data so far can be made. This new pattern in the data space. It starts with calculating (recursively)
na

concept allows to rethink the very foundations of statistical and the cumulative proximity, Jr, then the standardized
machine learning as well as to develop a series of anomaly eccentricity, &; and the density, D and finally, the multimodal
detection, clustering, classification, prediction, control and other typicality, F. Moreover, unlike the pdfwhere identifying the
algorithms. number and position of modes is a well known problem
usually solved by clustering, the multimodal typicality, F is
Keywords-empirical data analysis; multimodal typicality; data-
derived from the data automatically and without clustering or
driven; recursive calculation; inference; estimation.
other additional algorithm. It is often quite close to the
Fi

histograms but is not the same since it does take into account
I. INTRODUCTION the mutual position of the data as well as the frequency of
Most of the human activities have already been largely their occurrence,!
changed in the recent decades because of the very fast The advantages of multimodal typicality, F are obvious
development of infonnation technologies and the Internet.
because it combines pdf, histogram and mutual distributions of
Astronomical and ever increasing amount of data is being
the data points together and has a closed analytical form that
generated every day. As a result, data analysis is a rapidly
can be manipulated further. In addition, multimodal typicality,
growing field due to the strong need of processing large data
F can be calculated recursively, which makes it suitable for
applications to online data streams.
This work was supported by The Royal Society (Grant number
IE I41329/2014)

978-1-5090-1897-0/16/$31 .00 ©2016 IEEE SMC_2016 000052


2016 IEEE International Conference on Systems, Man, and Cybernetics' SMC 20161 October 9-12, 2016' Budapest, Hungary

Moreover, one can infer the typicality , z(x) for any 1) cumulative proximity, "
feasible value x by interpolation or extrapolation and the
typicality can be used in a manner similar to probability 2) standardized eccentricity, &

because it has same properties of being between 0 and J and 3) density, D , and finally
summing up to J. It is demonstrated in this paper that it can be
used to build a naIve typicality-based EDA Classifier [10] 4) typicality, T.

which not only provides better results than the naiVe Bayes In addition, the calculations can be recursive and the
classifier (due to taking into account the actual distribution of multimodal form of the typicality is also introduced.
the data) but also in comparison with the SVM classifier [11].
In addition to that, the naIve EDA Classifier does not require A. Theoretical Basis
any prior assumption of the distribution of the data selection First of all, let us consider the real Hilbert space R d and
of the type of the kernel or threshold constants or iterative assume a particular data set or stream denoted as
optimization as the SVM approach does .
{Xl ' X 2 , ••• , X k } E Rd , where the subscripts denote the time
Another very interesting feature is that this new method is instance at which the data point arrives. Within the data
free from some paradoxes that the traditional probability set/stream, some of the data points may repeat more than once,
density function has [12]. Additionally, it combines effectively namely,::ii,j I x , = X j . The set of unique data point locations at
the two different types of randomness representation: a) the
one based on the ratio of number of times a discrete random time instance k can be defined as {up U 2 , •• . , u,}E R d and the

on
corresponding number of times {.t; ,.t;,... , f,} different data
variable occurs ( p = lim 1::... ; where k* is the number of times points occupy the same unique locations (;; can be viewed as
k -,>oo k
of occurrence and k is the total number of data frequency if it is divided by k). Obviously, always I ~ k , but
points/observations), see Fig.2 and b) the one based on pdf, more often in real problems because of having exactly the
see Fig. 1(d). In the probability theory and statistics literature same values many times, I << k. Based on {up u 2 , .. . , U , }
the transition between a) and b) is often neglected. In fact, a)
and {J; ' /2 "'" It}, we can reconstruct the data set

si
(the multuimodal version, j'4M) is suitable for games such as
dices, coins, balls etc. or "pure" random processes, white noise {Xl ' X 2 , •• • , X k } exactly if needed regardless of the order of
type while b) (the unimodal typicality, r) is applicable to most arrival of the data points. Further in this paper, all derivations
of the real processes which are a more complex mixture of are conducted at the k'h time instance by default if there is no
deterministic and random or, more correctly, complex special declaration.
er
phenomena. The proposed typicality works well for both a)
and b) and combines them in a unique closed form 1) Cumulative proximity
representation without assuming k-+CXJ a type of the Cumulative proximity is a measure indicating the degree
distribution, or any user- or problem- specific parameters. of closeness/similarity of a particular data point to all other
This novel non-parametric and assumption-free empirical existing data points [7]-[9] .
lV

data analysis framework is very promising and attractive


The cumulative proximity of the unique data point u, is
because it is entirely based on the actual discrete/digital data
and we live in the era of big (digital) data revolution. It is expressed as:
logical to develop and use methodologies and approaches that ,
are suitable and tailored to this reality rather than stick to ,,;(u;) = Ld2(u"uJ i = 1,2,... , I (1)
methods that were introduced centuries ago analyzing j~l

primarily analog signals and pure random phenomena like


games or assume distributions a priori. where d{ui'uJ is the distance between two unique locations
na

The rest of this paper is organized as follows: section II u and u j • The distance can be of Euclidean, Mahalanobis
1

introduces the novel empirical data analysis (EDA) framework type, based on cosine [12] any other metric.
including the theoretical basis, multimodal typicality and the 2) Standardized eccentricity
recursive calculations. Section III describes the method of
making inference within this novel framework. An additional Eccentricity was introduced in [7]-[9] to represent the
example of multimodal typicality and inference of a association of the data point with the tail of the distribution
and the property of being an outlier/anomaly [8]. The
Fi

benchmark dataset is presented in section IV. The naiVe EDA


Classifier is introduced and results of its application to standardized eccentricity of u, ( i = 1,2,... , I) is calculated as
benchmark problems compared with other classifiers in
section V. Finally, the conclusions and future directions are
&ku(U,) = E[ll";
2ll"; (u,)
(u)] ,E [ll"ku()] 0k
U >, I
> 1, > 1 (2)
provided in section VI.

II. EMPIRICAL D ATA A NALYSIS FRAMEWORK


First, let us introduce the main quantities that represent
where, i = 1,2, ... , I,E[,,: (u)] = 7i>:(uJ .
j~ l

ensemble properties of the data within EDA:

SMC_2016 000053
2016 IEEE International Conference on Systems, Man, and Cybernetics' SMC 20161 October 9-12, 2016' Budapest, Hungary

,
It is interesting to note that L &~ (u;)= 21 . or in terms of the distance:

~ t d'(U"UJt( t d'(u"uJJT
i =!

3) Density r;(uJ [ (6)


Data density is inversely proportional to the standardized
eccentricity [9]. The density of u i ( i = 1,2, .. , I) is defined as
The cumulative proximity, ff, standardized eccentricity, &;
equation (3): density, D and typicality, T for a real climate data set [13]
1 measured in Manchester, England in the period 2010-2015
D; (u;) = &; (u;) (3) taking only winter and summer days is shown in Fig. 1 as
illustrative examples.
4) Typicality
B. Multimodal Typ icality
Typicality is a normalized data density [9]: In this paper, we further introduce a multimodal typicality,
D"(U) , F which is derived directly through the combination of the
r k" (U) = I k
I
I
' L..J D"k (U)
'"'
}
> 0' k > 1, l> 1 (4) cumulative proximity, Ji " and typicality, r " of the unique
ID; (uJ j~J data locations {u J , u2 , ••• , u,} and the corresponding times the

on
r'
data occupy the same unique location, {.t; ,/z,... , h }. The
j ~J

or in terms of the cumulative proximity:

t
multimodal typicality, ~ is defined as:

r; (uJ ~ [ H; (uJ (H; (u,)) , i ~ 1,2, .., I (5)

)( 10 5

2.5.,

,.,,@o
0

si 0
0
er
0
0
0
% 0 o~
'~ 1.5 1 0
~o 0 ~o
0°,8::co 'w o 0
~_n c/kJ
0
~

":L
0 08:,0 00 0 0
~<P~~'boo

..
OOOCD 0
;0> _Q.:
~.D _
lV

-10

10
~40
20
1 ""---
-10 =--
---- 40

30 Windspeed (mph) 30 Windspeed (mph)


Temperature (Co) Temperature (Co)

(a) Cumulative proximity (b) EccentriCity


na

x 10· 3

'C&
1 1

0.8

'''L
0.6

MW,8)~~~n '~
Fi

OV 0
0.2
vv~
:sgO<p 0
O'~dP
0

0
-1 0 0~40 0
-10
~
40

10
30 Windspeed (mph) 30 Windspeed (mph)
Temperature (CO) Temperature (CO)

(C) Density (d) Typicality


Fig. 1. The cumulative proximity, standardized eccentricity, density and typicality of the un ique data points of climate dataset

SMC_2016 000054
2016 IEEE International Conference on Systems, Man, and Cybernetics' SMC 20161 October 9-12,2016' Budapest, Hungary

pdf, but if follow the purely frequentistic form p = lim ~ [2]


0.7
k~oo k
0.6
the result will be exactly the same. Yet, the typicality (defined
in equations (5)-(6)) automatically starts to approximate the
0.5 pdf (see Figure l(d)) for larger number of points and if use
multimodal typicality (equation (7), which also takes into
0.4 account the frequency, j) it provides a multimodal likelihood
"".... quite close to the histogram, see Figures 3-5.
0.3

0.2
I I Ul U2
To further investigate the advantages of the multimodal
typicality, the 3-dimensional multimodal typicality, F of the
climate dataset [13] is shown in Fig. 3. A comparison of F
0.1 with the traditional histogram [4], [lO] as well as with the
normal (Gaussian) pdf[4], [10] is also made in a 2-D graph for
0
1 visual clarity, shown in Figure 4.
The advantages of the multimodal typicality, r MM can be
Fig.2. A simple example of multi modal typicality
summarized as follows:

on
1. This typicality takes the spatial density, D into
consideration.
2. The frequency of the occurrence, f of a certain data
0.015

0.01
0.07
,
,~

si
0.06
0.005

, , 0.05

30 0.04
30
er
20

0.03
Windspeed (mph)
Temperature (Co)
0.02

Fig.3. 3-dimensional multimodal typicality ofthe climate dataset


0.01
lV

0
·10 -5 10 15 20 25 30 35
Temperature (CO)

(a) Temperature
r~(u;)= /r;(u J I
j;D;(u;) (7)
Lfjr;(uJ Lf;D;(uJ 1
0.12 1 -~-..-----'-----r--r--;:;;::::;;:;==:;l

p I j ~1
na

0.1

I I
0.08
where Lf;r;(uJ> 0, Lf;D;(uJ> O,k > 1,1> 1
p I j~ 1
0.06

A simple example to illustrate how multimodal typicality 0.04


works for small values of k (amounts of data points) and the
Fi

fact that it coincides with the frequency-based probability,


0.02

p = lim ~ [2],[4],[lO] is shown in Fig. 2 where a small data


k ~oo k o' ''''''''''''- ! m Ul" " 1111 ,

·5 10 15 20 25 30 35
set of 3 data points is considered in which two of them have Windspeed (mph)
exactly the same value: x={2;5;5}. Obviously, u={2;5}; 1=2;
(b) Wind speed
k=3; l<k. Naturally, the value of , lMM = {J/3; 2/ 3} while for
such small number of k it is not possible to get a meaningful FigA. Comparison of the multimodal typicality, histogram and Gaussian
pdjofthe climate dataset in 2-D

SMC_2016 000055
2016 IEEE International Conference on Systems, Man, and Cybernetics' SMC 20161 October 9-12,2016' Budapest, Hungary

4. It provides in a closed analytical form without any prior


0.07 assumptions made.

0.06
1= ::,0=1 Multimodal typicality, r MM is a function having the
following properties:
0.05
a) it swns up to 1;
0.04
b) it is very close to (but not the same as) the histogram;
0.03
c) its value is within the range [0;1];
0.02 d) it combines the two completely different forms used so
far: histogram and pdfin one expression;

II II I
0.01
e) it combines the two representations of the probability
OL.II11 (frequency-based and distribution-based)
·5 10 15 20 25 30
Temperalure (CO)
t) it is free from the paradoxes thatpdfis related to [12].
(a) Interpolation
For small values of k, the multimodal typicality is exactly
the same as the frequentistic form of probability (see Fig. 2),

on
0.07
and with large k, it tends to the pdf For 1< k or 1« k (when
0.06
1= ::,0=1 there are different data points with the same value) different
modes will appear automatically while for cases when 1= k
0.05 or I ~ k one may still need clustering to get a multi-modal
representation [12].
0.04

si
0.03
C. Recursive Calculations
Recursive calculations play a very significant role in
0.02 online data streams processing [14]. With the utilization of
recursive calculation, there is no need to keep large amount of

II. data samples in the memory, which is more computation and


0.01
er
OL. II II II memory efficient. The multimodal typicality, r MM can be
·5 10 15 20 25 30 35
Temperalure (CO)
updated recursively as follows.
(b) Extrapolation If the data point that arrives at time instance k+1, X k+ 1 is
not unique (there is already a point with exactly the same
lV
value), then the multimodal typicality can be updated using
[=::,:1 the following equation:
0.015
J;D;(u;)
I Ui "* x k+l' u j = X k+1
0.01 L.0D;(uJ+ D;(uJ (8)
A.hl.f ( ) _ j =!
T k+1 U i -
(j; + I )D; (u;)
I Ui = X k+!
na

0.005
L .0D;(uJ+ D;(uJ
, , j =1

30 In case, when the new coming data point, X k+ 1 has unique


30
20 new location, u,+! then the set of unique locations
Windspeed (m ph)
Tempe ratu re (CO)
{ul,UZ, ... ,uJ is being appended by U'+I = X k+l becoming
1+1} and {.t; ,.1; ,... , it }is
{U1,U2,···, u1' U also appended
Fi

(c) 3D multimodal typicality with inferences of 5 arbitrary data pOints by it+1= 1 becoming {.t; ,.1; ,..., it ,it+1} or {.t;,.t; ,... , J;,1 } .
Fig.5. Examples of inference of multi modal typicality
The cumulative proximity can be updated as follows:
sample is also taken into account. n;+1(u i ) = n; (u i ) + d 2(u i , U'+I ), i = 1,2,. .. 1 (9)
3. There is no need for clustering algorithm , thresholds or For the case when Euclidean distance is considered it
any parameters or complicated pre-processing technology becomes [7]-[9]:
involved to generate the multimodal typicality distribution.

SMC_2016 000056
2016 IEEE International Conference on Systems, Man, and Cybernetics' SMC 20161 October 9-12,2016' Budapest, Hungary

J= ::,.,..1
0.07 , 1

nf.,.,.. I
12
0. 1
1
- '~
0.06
0.1

0.05
0.08

0.04

0.06

0.03

0.04
0.02

0.02
0.01

-5 10 15 20 25 30 10 15 20 25 30 35
Te mperature (Co) Windspeed (mph)

(a) Temperature (b) Wind speed

on
Fig.6. Curves of empirical likelihood ofthe climate dataset in 2-D

1l";+1(u,+J = (t + lX(u,+, - ,u,+J (U'+I - ,u,)+ U'+I - ,u: I,u'+I) (10) where UR ,i and UL,i are the ifh dimensional values of U R,i and
U L,i that are the nearest to U'+ I in the j'h dimension and satisfy
1 1
where ,u'+1=--,u,
1+1
+--U'+
1+1
I' (11) UL,i < u ' +1,i < UII ,i '

lI T If the data point is outside of the range (that is,

si
U'+1=--U,
1+ 1
+--U'+
1+ 1
IU'+I' (12) extrapolation, see Figure 5(b), the red line), the frequency is
set to 1: /'+1= 1. Because such points for which inference is
After updating of the cumulative proximities, the typicality made do not actually exist (they are virtual), they should not
of all the unique data points including the new one at time influence the typicality of the other really existing data points.
er
instance k+ 1 can be updated using equation (6) as well as their The density of the virtual data point is:
corresponding multimodal typicality at time instance k+1 can

t(i: (u" u .)~


be updated using equation (7).

III. INFERENCE USING THE MULTIMODAL TYPICALITY


D; (u,,,) = Ep,;(u )]
21l"; (U'+I) =
7
_I~ ,J.~I. .:. . d__
2

1 ) (14)
2 Id (u'+i'uJ
lV
In this section, the inference mechanism based on the 2

proposed empirical data analysis framework, EDA will be 1~1

introduced. The inference only applies to feasible points and


Therefore, using equation (7) the typicality of any new
therefore, fIrst step is to check if a data point is feasible or not
which is problem dependent, e.g. there cannot be a negative data point can easily be estimated using D; (U'+I) and /'+1:
mass, distance, age, etc. For feasible values, we can then
estimate the typicality as follows. rt'M (u,+J = ~+,D; (u,+J (15)
I~D;(uJ
na

For the data sample that occupies a new unseen location,


the set of unique locations is fIrst updated by appending UI+I: j~1

{U I,U2 , ••• ,u" u,+J. Then ,u'+1 and U'+ I are recursively Fig. 5(c) depicts a 3D example with fIve feasible points for
updated using equations (11) and (12), and then, most which an inference is made shown in red. In Fig. 6 inference
r;
importantly, calculate (U'+I) using equations (6)-(10). for many points is made (in red) which leads to a typicality
graph that looks like continuous (it is not continuous because
If the data point at which an estimate/inference of the the number of actual data points, k plus the points for which
Fi

typicality is made is within the range of the dataset (that is, the inference is made is not infInite).
perform an interpolation, see Fig. 5(a), the red line) then the
frequency /'+1 is estimated as the average of the frequencies
IV. EXAMPLE
of the two nearest data points of each dimension :

/'+1= ~ t
{~I
(U' +I,I - U L,i ) fR,I + (u R,i
(u R,i - u L.J
- U' +I,I ) fL,I (13) F
In this section, another example of multimodal typicality,
and inference for a benchmark dataset is presented. The
benchmark dataset in question is a wrist-worn accelerometer
(3 dimensional) data set for activities of daily life (ADL)
recognition described in [15]. In this paper, only part of the
data is used (5 clusters with 150 data points per cluster). The

SMC_2016 000057
2016 IEEE International Conference on Systems, Man, and Cybernetics' SMC 20161 October 9-12, 2016' Budapest, Hungary

on
si
er
~ w ~ M ~ ~ • a ~ ~ ~ w 35 40 45 50 55
y

(c) y -axis (d) z-axis


Fig.7. Example ofwrist-wom dataset
lV

data is real and interesting because they are clearly not uni- Assuming there are C classes, the class label for newly
modal. The proposed multimodal typicality can be determined arriving data points will be determined by:
without any clustering. It is visually very similar to the
histograms that can be obtained. The 3D graph of the C{x) = arg~x(r~ (x)) (16)
multimodal typicality and inference of the wrist-worn dataset p I

is shown in Fig. 7(a) (the inference made for several points is


shown in red). The 2-D curves of the empirical likelihood of where, the multimodal typicality of x in the v1h class is
na

the data in 3 dimensions (x-axis , y-axis, z-axis) shown defined per class as:
respectively in Fig. 7(b)-(d) .
r MM (x) = h / v+1 D;'..,(x) (17)
k,v C Ij
V. NAiVE EDA CLASSIFIER
So called naIve Bayes classifier have been extensively
II
j =l i=1
!;.lD;.J (U J.l )
studied and widely used in various fields [6],[10]. NaIve
where the index Iv indicates the number of points in the v1h
Fi

Bayes classifiers conduct classification based on a pre-defined


pdf assuming usually normal (Gaussian) distribution of the class;
data which is obviously not the case in reality (as seen above).
In this paper, a new simple, yet quite effective (especially for DZ.v{x) is calculated by equation (14) per class and
complex problems) classifier is introduced, called NaIve EDA
classifier using essentially the same concept but representing f v/v+ 1 is calculated by equation (13).
the data distribution with their multimodal typicality instead of
a pre-defined smooth (but idealized) pdf Indeed, the class label for x is decided by the multimodal
typicality which has highest value among all classes.

SMC_2016 000058
2016 IEEE International Conference on Systems, Man, and Cybernetics' SMC 20161 October 9-12,2016' Budapest, Hungary

The performance of the proposed naIve EDA classifier was observations of discrete data points and the ensemble data
tested on a well-known challenging problem called PIMA properties are extracted from the data directly and
dataset [16]. The performance of the proposed naIve EDA automatically. A new concept, called multirnodal typicality, is
classifier was compared with the best known classifier SVM also proposed within this framework. Multirnodal typicality
and with the naiVe Bayes classifier which it resembles. First, takes the mutual distributions and frequencies of occurrence of
90% (691 points) of the data set were used for training. The the data points into account at the same time and combines
PIMA dataset is described in [16]; in this paper we only use classical pdf, histogram into one expression. It also combines
the following attributes: the frequentistic and the distribution-based interpretation of
probability. There is no need for any complicated
1) number oftirnes pregnant;
preprocessing technology or clustering algorithm to generate
2) plasma glucose concentration a 2 hours in an oral multimodal typicality, and it also can be calculated
glucose tolerance test; recursively. Within this new framework, inferences of
multimodal typicality of virtual data points can be made
3) diastolic blood pressure (nun Hg); entirely based on the existing data points, and a curve of
4) triceps skin fold thickness (mm). empirical likelihood can be drawn optionally. This new
empirical data analysis approach is very powerful in real cases
The results are depicted in Table I in the form of a and will be a promising tool in the field of data analytics.
confusion matrix. The proposed naIve EDA classifier provides
80.5% accuracy compared with the 79.2% for the SVM As future work, this novel empirical data analysis

on
classifier using linear kernel function [11] and 77.9% of the approach will be applied to various machine learning
naIve Bayes classifier using Gaussian distribution. Obviously, problems such as anomaly detection, clustering, more
the performance of the proposed naIve EDA classifier is the sophisticated classification algorithms, data prediction, etc.
best which is not unexpected, because it does take into account
REFERENCES
the real data distribution rather than idealize it. Moreover, it
does not require the decision maker to make a choice of a [II L. A. Zadeh, "Fuzzy sets", Information and Control, vol. 8 (3), pp.338-
distribution or parameters or have iterative optimization but is 353, 1965.

si
an entirely data driven (objective) approach and is free from [2] T. Bayes, "An essay towards solving a problem in the doctrine of
changes", Phil. Trans. Roy. Soc. London, vol. 53, pp. 370, 1763.
problem- and user- specific parameters and assumptions.
[3] A. Gammerman, V.Vovk, H. Papadopoulos (Eds.), Statistical Learning
and Data Sciences, Springer, 2015, pp.169-178.
VI. CONCLUSION AND FUTURE DIRECTION [41 C. Bishop, Pattern Recognition and Machine Learning, Springer, 2007,
er
In this paper, a novel non-parametric empirical data
ISBN-I3: 978-0387310732.
analysis approach is introduced which is free from any prior [5] P. Del Moral, "Non linear filtering: interacting particle solution",
Markov Processes and Related Fields, vol. 2 (4), pp. 555- 580, 1996.
assumptions. This approach is entirely based on empirical
[6] J. Principe, Information Theoretic Learning: Renyi's Entropy and Kernel
Perspectives, Springer, 2010, ISBN: 978-1-4419-1569-6.
[71 P. Angelov, "Typicality distribution function- a new density-based data
TABLE I. analytics tool" in IEEE International Joint Conference on Neural
lV
CONFUSION MATRIX FOR THE VALIDATION DATA

Classification Results Networks (IJCNN), Killarney, 2015, pp. 1-8.


[81 P. Angelov,"Anomaly detection based on eccentricity analysis" in IEEE
Method ActuallClassijication Negative Positive Symposium on Evolving and Autonomous Learning Systems (EALS),
73.91% 26.09% Orlando, USA, 2014, pp.I-8.
Negative [9] P. Angelov, 'Outside the Box: An Alternative Data Analytics
(proposed) (34 Samples) (12 Samples) Framework'. Journal of Automation, Mobile Robotics and intelligent
Na"ive EDA Systems, 2014, 8(2), pp.53-59.
9.68% 90.32%
Classifier [10] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical
na

Positive
(3 Samples) (28 Samples) Learning: Data Mining, Inference, and Prediction, 2nd Edition, Springer,
82.61% 17.39% 2009, ISBN-13: 978-0387952840.
Negative [II] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector
(38 Samples) (8 Samples) Machines : and Other Kernel-Based Learning Methods, Cambridge
NaIve Bayes University Press, 2000, ISBN:052I780I95 (hb).
Classifier 29.03% 70.97%
Positive [12] P. Angelov, X. Gu and D. Kangin, "TEDA: typicality based empirical
(9 Samples) (22 Samples) data analysis", unpublished.
82.61% 17.39% [13] http://www.worldweatheronline.com
Fi

Negative [14] P. Angelov, Autonomous Learning Systems from Data Stream to


(38 Samples) (8 Samples) Knowledge in Real Time. John Wiley & Sons, Ltd., 2012, ISBN :
SVM
Classifier 25.81% 74.19% 9781119951520; ISBN:978I1I848I769.
Positive [15] http://archive.ics.uci.edu/ml/datasets/Dataset+for+ADL+Recognition+w
(8 Samples) (23 Samples) ith+Wrist -worn+Accelerometer
[16] https:llarchive.ics. uci .edu/ml/datasets/Pima+Indians+Diabetes

SMC_2016 000059

You might also like