You are on page 1of 16

DATA MINING & BUSINESS

INTELLIGENCE
UNIT 1
INTRODUCTION TO DATA MINING
&
PRE-PROCESSING
TOPICS TO BE COVERED
1. DATA MINING- DEFINITION
2. DATA MINING FUNCTIONALITIES
3. KDD PROCESS
4. DATA CLEANING-MISSING VALUES,NOISY DATA,DATA
INTEGRATION
5. DATA REDUCTION-DATA CUBE AGGREGATION,DIMENSIONALITY
REDUCTION,DATA COMPRESSION,NUMEROSITY REDUCTION
DATA MINING DEFINITION
• The amount of data kept in computer files and databases is growing at fast rate.
Users of these data are expecting more sophisticated information from them.

• Data mining steps in to solve these needs.

• Data mining is the process of finding important information (or


knowledge) from large amounts of data.
DATA MINING DEFINITION
• data sources can include databases, data warehouses, web, other
information repositories.
• Data mining alternatively, has been called exploratory data analysis, data
driven discovery, deductive learning.
DATA MINING FUNCTIONALITIES
• Data mining functionalities specify kind of patterns to be found in data
mining tasks.
• data mining tasks ARE classified into 2 categories: descriptive ,predictive.

• Descriptive mining tasks characterize general properties of data in the


database.

• Predictive mining tasks perform inference on current data in order to


make predictions.
DATA MINING FUNCTIONALITIES
• Data mining functionalities, and the kinds of patterns they can discover, are:

1.Concept/Class Description: Characterization and Discrimination


2.Mining Frequent Patterns, Associations, and Correlations
3.Classification and Prediction
4.Cluster Analysis
5.Outlier Analysis
KDD PROCESS
Data mining IS DEFINED AS as the process of knowledge discovery.

Data mining discovers knowledge or information that you never knew was
present in your data.

Knowledge discovery in databases (kdd) is the process of finding useful


information and patterns in data.
Data mining is the use of algorithms to extract the information and patterns
derived by the kdd process.
KDD PROCESS
KDD is a process that involves many different steps.
Input to this process is data & output is useful information desired by the users.
However, the objective may be unclear or inexact.

The process itself is interactive and may require much elapsed time.
To ensure usefulness and accuracy of results of process, interaction throughout
the process with both domain experts and technical experts might be needed
KDD PROCESS
KDD process consists of the following five steps
• Selection: Data needed for data mining process may be obtained from
different , heterogeneous data sources. This 1st step obtains data from various
databases, files, and nonelectronic sources.
KDD PROCESS
• Preprocessing: The data to be used by the process may have incorrect or
missing data.
There may be anomalous data from multiple sources involving different data
types and metrics. Erroneous data may be corrected or removed, whereas missing
data must be supplied or predicted (often using data mining tools).
KDD PROCESS
• Transformation: Data from different sources must be converted into a common
format for processing. Some data may be encoded or transformed into more
usable formats.
KDD PROCESS
• Data mining: Based on the data mining task being performed, this step applies
algorithms to the transformed data to generate the desired results.
KDD PROCESS
• Interpretation/evaluation: How data mining results are presented to users is
extremely important because usefulness of results is dependent on it.
Various visualization and GUI strategies are used at this last step. Transformation
techniques are used to make the data easier to mine and more useful, and to
provide more meaningful results.
KDD PROCESS
• Knowledge Representation: Where visualization and knowledge
representation techniques are used to present mined knowledge to users.
KDD PROCESS

You might also like