You are on page 1of 11

Presentation topic:

Data
Transformation by
Normalization
Data Transformation:

 A function that maps the entire set of values • Normalization: Scaled to fall within
of a given attribute to a new set of i.e. a smaller, specified range.
replacement values each old value can be
identified with one of the new values min-max normalization
 Methods z-score normalization
normalization by decimal scaling
• Smoothing: Remove noise from data
• Attribute/feature construction • Discretization: Concept hierarchy
climbing
• New attributes constructed from the
given ones
• Aggregation: Summarization, data cube
construction

Data normalization
The data normalization (also
referred to as data pre-
processing) is a basic element of
data mining. It means
transforming the data, namely
converting the source data in to
another format that allows
processing data effectively. The
main purpose of data
normalization is to minimize or
even exclude duplicated data.
Methods for data normalization:

Min-max normalization
Min-max normalization performs a linear transformation on the original data.
Suppose that minA and maxA are the minimum and maximum values of an
attribute, A. Min-max normalization maps a value, vi of A to vi’ in the range
[new_minA, new_maxA] by computing

v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
Min-max normalization preserves the relationships among the original data values. It
will encounter an “out-of-bounds” error if a future input case for normalization falls
outside of the original data range for A.
Methods for data normalization:
Example Min-max normalization
Suppose that the minimum and maximum values for the
attribute income are $12,000 and $98,000, respectively. We
would like to map income to the range [0.0,1.0]. By min-max
normalization, a value of $73,600 for income is transformed
to:

73,600  12,000
(1.0  0)  0  0.716
98,000  12,000
Methods for data normalization:

z-score normalization (or zero-mean normalization)


In z-score normalization (or zero-mean normalization), the values for an attribute, A, are
normalized based on the mean (i.e., average) and standard deviation of A. A value, vi, of A is
normalized to vi’ by computing

v  A
v' 
 A
Methods for data normalization:

Example Z-score normalization


z-score normalization. Suppose that the mean and standard
deviation of the values for the attribute income are $54,000
and $16,000, respectively. With z-score normalization, a value
of $73,600 for income is transformed to
Ex. Let μ = 54,000, σ = 16,000. Then

73,600  54,000
 1.225
16,000
Methods for data normalization:
Normalization by decimal scaling
Normalization by decimal scaling normalizes by moving the decimal
point of values of attribute A. The number of decimal points moved
depends on the maximum absolute value of A. A value, vi, of A is
normalized to vi’ by computing

v
v' 
10 j
Methods for data normalization:

Example of normalization by decimal scaling


Suppose that the recorded values of A range from −986 to 917.
The maximum absolute value of A is 986. To normalize by
decimal scaling, we therefore divide each value by 1000 (i.e., j
= 3) so that −986 normalizes to −0.986 and 917 normalizes to
0.917. Note that normalization can change the original data
quite a bit, especially when using z-score normalization or
decimal scaling. It is also necessary to save the normalization
parameters (e.g., the mean and standard deviation if using z-
score normalization) so that future data can be normalized in a
uniform manner.
Thank
you

You might also like