You are on page 1of 3

Laila Malas BI Data preprocessing 20150042

Data Definition Subtasks Methods


preprocessing

Data Consolidation Understanding the nature  Collect data o Statistical Exams


of The data with deep  Select data o SQL queries
understanding of the data  Integrate data from o Web services
by collecting the data multiple database o Real life driven
from variety of sources  Scheme Integration data mapping
that have been identified  Filter the data
(understand the depth of
the data and the parts not
needed to remove)
Data Cleaning Is the process of removing  Detect the outlier o Using statistical
, detecting The incorrect,  Eliminate technique to
inaccurate and incomplete inconsistencies identify outliers
data and then deleting the  Fill the missing using average and
dirty data that made noise values ,tuples standard deviation
(Records) in the o Mean,
table Median ,Mode to
reduce noise of
data record the
Missing value
with ‘ML’
o Define the wrong
values of the data
other than outlier
such as odd values
o Regression and
clustering
Data The process for  Normalize data o Put standard
Transformation converting data to one  Aggregate the data Range for the
format from unstructured  Build new values to easy
to structured attribute(Construct specify the data
Change the Architecture ) (0-1 or -1-1)
of the data that cannot be o convert the
analyze it numeric values
Laila Malas BI Data preprocessing 20150042

into discrete
representation
using ranges or
frequencies for
categorical
values
o use Mathematical
functions for the
data and Derive
new variables
such as
Add ,subtract
o data cube
construction
o Smoothing
remove the noise
data

Data Reduction Is the process of reduce  Reduce the number o Correlation


huge amount of data that of variables analysis and
are need to be stored in  Reduce the number Decision Trees
data storage environment of cases o Histograms
that reduce storage quality  Balance skewed determine period
and reduce costs data beginning and
end
o Sampling :deter
mine small
samples
o Compression
data
o Using Hierarchy
generation
o Clustering
Laila Malas BI Data preprocessing 20150042

You might also like