Professional Documents
Culture Documents
INTELLIGENCE
UNIT 1
INTRODUCTION TO DATA MINING
&
PRE-PROCESSING
TOPICS TO BE COVERED
1. DATA MINING- DEFINITION
2. DATA MINING FUNCTIONALITIES
3. KDD PROCESS
4. DATA CLEANING-MISSING VALUES,NOISY DATA,DATA
INTEGRATION
5. DATA REDUCTION-DATA CUBE AGGREGATION,DIMENSIONALITY
REDUCTION,DATA COMPRESSION,NUMEROSITY REDUCTION
DATA MINING DEFINITION
• The amount of data kept in computer files and databases is growing at fast rate.
Users of these data are expecting more sophisticated information from them.
Data mining discovers knowledge or information that you never knew was
present in your data.
The process itself is interactive and may require much elapsed time.
To ensure usefulness and accuracy of results of process, interaction throughout
the process with both domain experts and technical experts might be needed
KDD PROCESS
KDD process consists of the following five steps
• Selection: Data needed for data mining process may be obtained from
different , heterogeneous data sources. This 1st step obtains data from various
databases, files, and nonelectronic sources.
KDD PROCESS
• Preprocessing: The data to be used by the process may have incorrect or
missing data.
There may be anomalous data from multiple sources involving different data
types and metrics. Erroneous data may be corrected or removed, whereas missing
data must be supplied or predicted (often using data mining tools).
KDD PROCESS
• Transformation: Data from different sources must be converted into a common
format for processing. Some data may be encoded or transformed into more
usable formats.
KDD PROCESS
• Data mining: Based on the data mining task being performed, this step applies
algorithms to the transformed data to generate the desired results.
KDD PROCESS
• Interpretation/evaluation: How data mining results are presented to users is
extremely important because usefulness of results is dependent on it.
Various visualization and GUI strategies are used at this last step. Transformation
techniques are used to make the data easier to mine and more useful, and to
provide more meaningful results.
KDD PROCESS
• Knowledge Representation: Where visualization and knowledge
representation techniques are used to present mined knowledge to users.
KDD PROCESS