You are on page 1of 6

DATA MINING

Introduction
As Information Technology (IT) has progressed, there has been an abundant
increase in volumes of collected data in the recent past from all sorts of varieties. It is,
therefore, beyond the capabilities of humans to extract meaningful information from this
vast amount of data, and it has become necessary to develop algorithms which can
extract meaningful information from these vast stores of data. Chowdhary K.R. (2020)
Data mining has attracted more and more attention in recent years, probably
because of the popularity of the “big data” concept. Data mining is the process of
discovering interesting patterns and knowledge from large amounts of data. As a highly
application-driven discipline, data mining has been successfully applied to many
domains, such as business intelligence, Web search, scientific discovery, digital
libraries, etc. (L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren. 2014)
The Process of KDD

The term “data mining” is often treated as a synonym for another term “knowledge

discovery from data” (KDD) which highlights the goal of the mining process.  See

Figure 1.

 Step 1: Data preprocessing. Basic operations include data selection (to retrieve

data relevant to the KDD task from the database), data cleaning (to remove noise

and inconsistent data, to handle the missing data fields, etc.) and data integration

(to combine data from multiple sources). (L. Xu, C. Jiang, J. Wang, J. Yuan and

Y. Ren. , 2014)

 Step 2: Data transformation. The goal is to transform data into forms appropriate

for the mining task, that is, to find useful features to represent the data. Feature

selection and feature transformation are basic operations. (L. Xu, C. Jiang, J.

Wang, J. Yuan and Y. Ren. , 2014)


 Step 3: Data mining. This is an essential process where intelligent methods are

employed to extract data patterns (e.g. association rules, clusters, classification

rules, etc). (L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren. , 2014)

 Step 4: Pattern evaluation and presentation. Basic operations include identifying

the truly interesting patterns which represent knowledge, and presenting the

mined knowledge in an easy-to-understand fashion. (L. Xu, C. Jiang, J. Wang, J.

Yuan and Y. Ren. , 2014)

Fig1. An overview of the KDD process.

Theoretical framework
Goals of Data Mining
The field of data mining aims to explore very large data sets efficiently, using methods
that are convenient, easy, and practical. However, this should be without extensive
training as well without a large work force. All the data mining applications have some
common goals, of identifying the patterns in the data, interpretation of these patterns,
and then perform the prediction or description either qualitatively or quantitatively in
general for all the data including those which may be generated in the near future.
Chowdhary K.R. (2020)
DATA PREPARATION

Data preparation or data cleansing tasks will be performed repeatedly and not in

any order. These tasks include the selection of tables, records, and attributes, as well as
the transformation and cleansing of data in preparation for the modelling tools.

[ CITATION CEU21 \l 2058 ]

In figure 2 we observe the steps for data preparation, in the first step of the data

description is the dataset that we are going to use for modelling.

Data selection

Decide which data will be used for analysis. Criteria include relevance to data

mining objectives, quality, and technical constraints such as limits on data volume or

data types. Note that data selection covers the selection of attributes (columns) as well

as the selection of records (rows) in a table.[ CITATION Dat07 \l 2058 ]

Data Cleaning
Raise the quality of the data to the level required by the selected analysis

techniques. This may involve selection of clean data subsets, insertion of appropriate

defect data, or more ambitious techniques such as estimation of missing data by

modelling.[ CITATION Dat07 \l 2058 ]

Construct data

This task includes the construction of data preparation operations such as the

production of derived attributes or the entry of new records, or the transformation of

values for existing attributes.[ CITATION Dat07 \l 2058 ]

Integrate data

These are the methods by which information is combined from multiple tables or

records to create new records or values. Table merging refers to the simultaneous

joining of two or more tables that have different information about the same object.

[ CITATION Dat07 \l 2058 ]

Format data

Formatting transformations refer to mainly syntactic modifications made to the

data that do not change its meaning but might be required by the modelling tool. Some

tools have requirements on the order of attributes, such as the first field being a unique

identifier for each record or the last field being the result field that the model should

predict.[ CITATION Dat07 \l 2058 ]

It may be important to change the order of the records in the dataset. Perhaps the

modelling tool requires the records to be sorted according to the value of the outcome

attribute.

Data mining techniques:


Decision trees

They are based on the application of a classificatory algorithm that, starting from

a node, develops branches (decisions) and determines the potential outcome of each

branch. [ CITATION DAT21 \l 2058 ]

Neural networks

Neural networks are models that, through machine learning, attempt to fill the

interpretation gaps in a system. In doing so, they mimic, to some extent, the connections

between neurons that occur in the nervous system of living beings. [ CITATION DAT21 \l

2058 ]

Clustering

Clustering in data mining aims at segmenting elements that have some defining

characteristic in common. In this case, the algorithm takes into account conditions of

closeness or similarity to do its work.[ CITATION DAT21 \l 2058 ]

Bayesian networks

Bayesian networks are graphical representations of probabilistic dependence

relationships between variables. They are used to solve both descriptive and predictive

problems.[ CITATION DAT21 \l 2058 ]

Regression

Regression as a data mining technique takes a historical series as a starting point

to predict what will happen next.[ CITATION DAT21 \l 2058 ]

CONCLUSION:
Data mining is currently gaining popularity as it is used by many companies to analyse

customer behaviour and through this to be able to offer products according to the needs

and obtain higher profits.

Referencias
Chowdhary K.R. (2020) Data Mining. In: Fundamentals of Artificial Intelligence. Springer, New
Delhi. https://bibliotecas.ups.edu.ec:2582/10.1007/978-81-322-3972-7_17

L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren, "Information Security in Big Data: Privacy and Data
Mining," in IEEE Access, vol. 2, pp. 1149-1176, 2014, doi:
10.1109/ACCESS.2014.2362522.

CEUPE. (14 de 06 de 2021). Proceso del Data Mining. Obtenido de


https://www.ceupe.com/blog/proceso-del-data-mining.html

DATAHACK. (07 de 01 de 2021). LAS 7 TÉCNICAS DE MINERÍA DE DATOS MÁS UTILIZADAS EN


BIG DATA. Obtenido de https://www.datahack.es/blog/big-data/tecnicas-mineria-
datos/

Dataprix. (15 de 09 de 2007). Preparación de datos. Obtenido de


https://www.dataprix.com/es/metodologia-crisp-dm-mineria-datos/preparacion-datos

You might also like