The Process of KDD: Y. Ren., 2014)

DATA MINING
Introduction
As Information Technology (IT) has progressed, there has been an abundant
increase in volumes of collected data in the recent past from all sorts of varieties. It is,
therefore, beyond the capabilities of humans to extract meaningful information from this
vast amount of data, and it has become necessary to develop algorithms which can
extract meaningful information from these vast stores of data. Chowdhary K.R. (2020)
Data mining has attracted more and more attention in recent years, probably
because of the popularity of the “big data” concept. Data mining is the process of
discovering interesting patterns and knowledge from large amounts of data. As a highly
application-driven discipline, data mining has been successfully applied to many
domains, such as business intelligence, Web search, scientific discovery, digital
libraries, etc. (L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren. 2014)
The Process of KDD
The term “data mining” is often treated as a synonym for another term “knowledge
discovery from data” (KDD) which highlights the goal of the mining process. See
Figure 1.
 Step 1: Data preprocessing. Basic operations include data selection (to retrieve
data relevant to the KDD task from the database), data cleaning (to remove noise
and inconsistent data, to handle the missing data fields, etc.) and data integration
(to combine data from multiple sources). (L. Xu, C. Jiang, J. Wang, J. Yuan and
Y. Ren. , 2014)
 Step 2: Data transformation. The goal is to transform data into forms appropriate
for the mining task, that is, to find useful features to represent the data. Feature
selection and feature transformation are basic operations. (L. Xu, C. Jiang, J.
Wang, J. Yuan and Y. Ren. , 2014)

 Step 3: Data mining. This is an essential process where intelligent methods are
employed to extract data patterns (e.g. association rules, clusters, classification
rules, etc). (L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren. , 2014)
 Step 4: Pattern evaluation and presentation. Basic operations include identifying
the truly interesting patterns which represent knowledge, and presenting the
mined knowledge in an easy-to-understand fashion. (L. Xu, C. Jiang, J. Wang, J.
Yuan and Y. Ren. , 2014)
Fig1. An overview of the KDD process.
Theoretical framework
Goals of Data Mining
The field of data mining aims to explore very large data sets efficiently, using methods
that are convenient, easy, and practical. However, this should be without extensive
training as well without a large work force. All the data mining applications have some
common goals, of identifying the patterns in the data, interpretation of these patterns,
and then perform the prediction or description either qualitatively or quantitatively in
general for all the data including those which may be generated in the near future.
Chowdhary K.R. (2020)
DATA PREPARATION
Data preparation or data cleansing tasks will be performed repeatedly and not in
any order. These tasks include the selection of tables, records, and attributes, as well as
the transformation and cleansing of data in preparation for the modelling tools.
[ CITATION CEU21 \l 2058 ]
In figure 2 we observe the steps for data preparation, in the first step of the data
description is the dataset that we are going to use for modelling.
Data selection
Decide which data will be used for analysis. Criteria include relevance to data
mining objectives, quality, and technical constraints such as limits on data volume or
data types. Note that data selection covers the selection of attributes (columns) as well
as the selection of records (rows) in a table.[ CITATION Dat07 \l 2058 ]
Data Cleaning
Raise the quality of the data to the level required by the selected analysis
techniques. This may involve selection of clean data subsets, insertion of appropriate
defect data, or more ambitious techniques such as estimation of missing data by
modelling.[ CITATION Dat07 \l 2058 ]
Construct data
This task includes the construction of data preparation operations such as the
production of derived attributes or the entry of new records, or the transformation of
values for existing attributes.[ CITATION Dat07 \l 2058 ]
Integrate data
These are the methods by which information is combined from multiple tables or
records to create new records or values. Table merging refers to the simultaneous
joining of two or more tables that have different information about the same object.
[ CITATION Dat07 \l 2058 ]
Format data
Formatting transformations refer to mainly syntactic modifications made to the
data that do not change its meaning but might be required by the modelling tool. Some
tools have requirements on the order of attributes, such as the first field being a unique
identifier for each record or the last field being the result field that the model should
predict.[ CITATION Dat07 \l 2058 ]
It may be important to change the order of the records in the dataset. Perhaps the
modelling tool requires the records to be sorted according to the value of the outcome
attribute.
Data mining techniques:

Decision trees
They are based on the application of a classificatory algorithm that, starting from
a node, develops branches (decisions) and determines the potential outcome of each
branch. [ CITATION DAT21 \l 2058 ]
Neural networks
Neural networks are models that, through machine learning, attempt to fill the
interpretation gaps in a system. In doing so, they mimic, to some extent, the connections
between neurons that occur in the nervous system of living beings. [ CITATION DAT21 \l
2058 ]
Clustering
Clustering in data mining aims at segmenting elements that have some defining
characteristic in common. In this case, the algorithm takes into account conditions of
closeness or similarity to do its work.[ CITATION DAT21 \l 2058 ]
Bayesian networks
Bayesian networks are graphical representations of probabilistic dependence
relationships between variables. They are used to solve both descriptive and predictive
problems.[ CITATION DAT21 \l 2058 ]
Regression
Regression as a data mining technique takes a historical series as a starting point
to predict what will happen next.[ CITATION DAT21 \l 2058 ]
CONCLUSION:
Data mining is currently gaining popularity as it is used by many companies to analyse
customer behaviour and through this to be able to offer products according to the needs
and obtain higher profits.
Referencias
Chowdhary K.R. (2020) Data Mining. In: Fundamentals of Artificial Intelligence. Springer, New
Delhi. https://bibliotecas.ups.edu.ec:2582/10.1007/978-81-322-3972-7_17
L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren, "Information Security in Big Data: Privacy and Data
Mining," in IEEE Access, vol. 2, pp. 1149-1176, 2014, doi:
10.1109/ACCESS.2014.2362522.
CEUPE. (14 de 06 de 2021). Proceso del Data Mining. Obtenido de

https://www.ceupe.com/blog/proceso-del-data-mining.html
DATAHACK. (07 de 01 de 2021). LAS 7 TÉCNICAS DE MINERÍA DE DATOS MÁS UTILIZADAS EN

BIG DATA. Obtenido de https://www.datahack.es/blog/big-data/tecnicas-mineria-
datos/
Dataprix. (15 de 09 de 2007). Preparación de datos. Obtenido de

https://www.dataprix.com/es/metodologia-crisp-dm-mineria-datos/preparacion-datos

The Process of KDD: Y. Ren., 2014)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Process of KDD: Y. Ren., 2014)

Uploaded by

Copyright:

Available Formats

DATA MINING

Wang, J. Yuan and Y. Ren. , 2014)

employed to extract data patterns (e.g. association rules, clusters, classification

 Step 4: Pattern evaluation and presentation. Basic operations include identifying

mined knowledge in an easy-to-understand fashion. (L. Xu, C. Jiang, J. Wang, J.

Yuan and Y. Ren. , 2014)

Fig1. An overview of the KDD process.

[ CITATION CEU21 \l 2058 ]

description is the dataset that we are going to use for modelling.

as the selection of records (rows) in a table.[ CITATION Dat07 \l 2058 ]

defect data, or more ambitious techniques such as estimation of missing data by

modelling.[ CITATION Dat07 \l 2058 ]

production of derived attributes or the entry of new records, or the transformation of

values for existing attributes.[ CITATION Dat07 \l 2058 ]

[ CITATION Dat07 \l 2058 ]

Formatting transformations refer to mainly syntactic modifications made to the

predict.[ CITATION Dat07 \l 2058 ]

Data mining techniques:

branch. [ CITATION DAT21 \l 2058 ]

closeness or similarity to do its work.[ CITATION DAT21 \l 2058 ]

Bayesian networks are graphical representations of probabilistic dependence

problems.[ CITATION DAT21 \l 2058 ]

Regression as a data mining technique takes a historical series as a starting point

to predict what will happen next.[ CITATION DAT21 \l 2058 ]

and obtain higher profits.

CEUPE. (14 de 06 de 2021). Proceso del Data Mining. Obtenido de

DATAHACK. (07 de 01 de 2021). LAS 7 TÉCNICAS DE MINERÍA DE DATOS MÁS UTILIZADAS EN

Dataprix. (15 de 09 de 2007). Preparación de datos. Obtenido de

You might also like