Professional Documents
Culture Documents
author's profile
ABSTRACT
In the world of data analytics, data normalization is not a new concept as it is a preprocessing stage of any
type of number driven business problem. The goal of normalization is to change the values of numeric
columns in the dataset to a common scale, without distorting differences in the ranges of values. There are
multitude of data normalization techniques available namely Min-Max normalization, Z-Score
normalization, coefficient based normalization etc. Data normalization may also vary based on the level of
measurement of the variables namely nominal scale variables, ordinal scale variable interval scale variable,
additive scale variable etc. However, the scope of this paper is purely focused on a continuous set of
numbers and deploy the proposed (MMAD) normalization technique to standardize the values for creating
a robust simple linear regression model. The alternative aim of this paper is also to pitch the proposed
(MMAD) normalization technique against the min-max normalization method to see its effectiveness and
robustness.
nglish
Language: E
μˆ m −xi
| | z= σ̂ m
III. PROPOSED METHOD – MMAD – The comparison study that is done via data
MEDIAN & MEDIAN ABSOLUTE tabulation and graphical representation is
DEVIATION BASED NORMALIZATION described below. This study helps to identify the
effectiveness of MMAD based data normalization.
In the introduction section of this paper we have In the below table we are comparing the proposed
already understood certain limitations or data normalization technique (MMAD) with min
disadvantages of a min – max normalization – max normalization technique as well through
technique and have setup an expectation that the box plot graph (below Fig.1 through 3) on
there has to be another data normalization highly non-normal customer sentiment data and
technique which is: customer conversation data named call duration.
1. Robust to the outlier problem
2. Applicable for any data size (small, medium or
large)
3. Easy and fast to implements
Data Normalization using Median & Median Absolute Deviation (MMAD) based Z-Score for Robust Predictions vs. Min – Max Normalization
Fig. 1
The above graph shows the non-normalized form improvement in the centrality and data dispersion
of the date which have different scales with values but not to a greater extent. The below box
different centrality and data dispersion values. plot clearly shows the improvements as well as the
opportunity to further standardize the data with
Post standardizing the values using the min – max some better and robust technique.
data normalization technique we saw some
Data Normalization Using Median & Median Absolute Deviation (MMAD) based Z-Score for Robust Predictions vs. Min – Max Normalization
Fig. 2 Fig. 3
Looking at figure 3 it clearly shows how effective effect of data normalization on the accuracy of
the MMAD data normalization technique is as: linear regression model that we have created
using two normalized data sets i.e. min – max
● It brought the value of centrality for two normalized data set and MMAD normalized data
differently scaled datasets to almost a similar set.
value
● It also reduced the data dispersion for both This exercise will lend itself to focus on the
the sets the min. value for both the data sets alternative aim of this analysis i.e. if the proposed
has almost a null difference method (MMAD normalization) helps to generate
● The median lines are almost identical a better simple linear regression model compared
to the min – max normalization.
Now, that we have standardized the data using
min – max and the proposed MMAD based data
normalization techniques, we would like to see the
Fig. 4
Data Normalization using Median & Median Absolute Deviation (MMAD) based Z-Score for Robust Predictions vs. Min – Max Normalization
When we ran the simple linear regression model with an R-squared value of 54.25% figure 7 for
using the propose MMAD data normalization linear model as compared to (51.91% for min-max
technique not much of an improvement we were normalized data figure 5) and for quadratic model
able to see for the cubic linear regression model 64.04% figure 7 as compared to (61.85% for
with an R-squared value of 78.20% figure 6 which min-max normalized data figure 5)
is pretty close the min-max based model’s
r-squared value.
Fig. 6
Data Normalization using Median & Median Absolute Deviation (MMAD) based Z-Score for Robust Predictions vs. Min – Max Normalization
REFERENCES
1. Hampel, F. R. (1974). The influence curve and
its role in robust estimation. Journal of the
American Statistical Association, 69( 346),
383-393, http://dx.doi.org/10.1080/016214
59.1974.10482962.
2. Ref: Leys, C., et al ;., Detecting outliers: Do
not use standard deviation around the mean,
use absolute deviation around the media,
Journal of Experimental Social Psychology
(2013),
http://dx.doi.org/10.1016/j.jsep.2013.03.013)
3. Huber, P. J. (1981), Robust Statistics. New
York; John Wiley
Data Normalization using Median & Median Absolute Deviation (MMAD) based Z-Score for Robust Predictions vs. Min – Max Normalization
LondonJour nalsPr e
ssme mbe rshipisane li
tec ommuni ty
ofscholars,researchers
,scienti
sts,prof
essional sandi n-
st
ituti
onsas sociatedwithallthemaj ordisciplines .
LondonJour nalsPr e
ssme mbe rshipsarefori ndiv i
duals,
res
e ar
c hinsti
tutions,anduni ver
sitie
s.Author s,s ubsc
r i
b-
ers
,Edi tori
alBoar dme mbe rs,Adv i
soryBoar dme mbers,
andor ganizati
onsar eallpartofme mbe rne twor k.
ForAut
hor
s ForI
nst
it
uti
ons ForSubs
cri
ber
s
AuthorMe mbe rs
hi pprov ide Societyfl
ourishwhe nt woinsti
tu- Subs cribet odistngui
i she dSTM
accesstoscienti
ficinnov ation, ti
onsc omet ogether."Organiz
ations, (scientifi
c ,t
echnic al,andme di-
nextgenerationtool s
,ac cesst o researchinsti
tutes,anduniversi
ties cal)publ isher.Subs c r
iption
conferences/seminars canj oi
nLJPSubs cripti
onme mbe r- me mbe rshipisav ailableforindi-
/sympos i
ums /we binars,ne twork- shiporpr i
vile
ge d"Fe l
lowMe mbe r- vidualsuni ver
siti
e sandi ns t
itu-
ingopportuni i
tes,andpr iv
ileged ship"me mbe rshi
pf acil
it
ati
ngr e- tions( print&onl ine ).Subs crb-
i
benefit
s. searcherstopubl i
sht hei
rworkwi th ersc anac cs
esjour nalsf r
om our
Authorsmays ubmi trese arch us,be c
omepe erreviewersandjoin li
br ari
e s,publishedi ndi ffe
rent
manus crptorpape
i rwithout usonAdv is
oryBoar d. format sl i
kePr i
nt edHar dcopy ,
beingane xisti
ngme mbe rofLJP. Inte r
ac ti
v ePDFs ,EPUBs ,
Onc eanon- me mbe raut hors ub- eBooks ,indexabledoc ume ntsand
mitsar es
ear chpape rhe /shebe - theaut hormanage ddy nami cli
ve
come sapar tof"Prov i
sional we bpagear ticl
es,LaTe X,PDFs
AuthorMe mbe rs
hi p". etc.
J
OURNALAVAI
LABLEI
N
s
uppor
t@j
our
nal
spr
ess.
com
www.j
our
nal
spr
ess
.com
*
THI
SJOURNALSUPPORTAUGMENTEDREALI
TYAPPSANDSOFTWARES
©C
©C
© opyr
ight2
Copyright 0
2 17L
019 ondon
nJ
London J
ournalsP
Journals ress
Press