Professional Documents
Culture Documents
Introduction
As Information Technology (IT) has progressed, there has been an abundant
increase in volumes of collected data in the recent past from all sorts of varieties. It is,
therefore, beyond the capabilities of humans to extract meaningful information from this
vast amount of data, and it has become necessary to develop algorithms which can
extract meaningful information from these vast stores of data. Chowdhary K.R. (2020)
Data mining has attracted more and more attention in recent years, probably
because of the popularity of the “big data” concept. Data mining is the process of
discovering interesting patterns and knowledge from large amounts of data. As a highly
application-driven discipline, data mining has been successfully applied to many
domains, such as business intelligence, Web search, scientific discovery, digital
libraries, etc. (L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren. 2014)
The Process of KDD
The term “data mining” is often treated as a synonym for another term “knowledge
discovery from data” (KDD) which highlights the goal of the mining process. See
Figure 1.
Step 1: Data preprocessing. Basic operations include data selection (to retrieve
data relevant to the KDD task from the database), data cleaning (to remove noise
and inconsistent data, to handle the missing data fields, etc.) and data integration
(to combine data from multiple sources). (L. Xu, C. Jiang, J. Wang, J. Yuan and
Y. Ren. , 2014)
Step 2: Data transformation. The goal is to transform data into forms appropriate
for the mining task, that is, to find useful features to represent the data. Feature
selection and feature transformation are basic operations. (L. Xu, C. Jiang, J.
rules, etc). (L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren. , 2014)
the truly interesting patterns which represent knowledge, and presenting the
Theoretical framework
Goals of Data Mining
The field of data mining aims to explore very large data sets efficiently, using methods
that are convenient, easy, and practical. However, this should be without extensive
training as well without a large work force. All the data mining applications have some
common goals, of identifying the patterns in the data, interpretation of these patterns,
and then perform the prediction or description either qualitatively or quantitatively in
general for all the data including those which may be generated in the near future.
Chowdhary K.R. (2020)
DATA PREPARATION
Data preparation or data cleansing tasks will be performed repeatedly and not in
any order. These tasks include the selection of tables, records, and attributes, as well as
the transformation and cleansing of data in preparation for the modelling tools.
In figure 2 we observe the steps for data preparation, in the first step of the data
Data selection
Decide which data will be used for analysis. Criteria include relevance to data
mining objectives, quality, and technical constraints such as limits on data volume or
data types. Note that data selection covers the selection of attributes (columns) as well
Data Cleaning
Raise the quality of the data to the level required by the selected analysis
techniques. This may involve selection of clean data subsets, insertion of appropriate
Construct data
This task includes the construction of data preparation operations such as the
Integrate data
These are the methods by which information is combined from multiple tables or
records to create new records or values. Table merging refers to the simultaneous
joining of two or more tables that have different information about the same object.
Format data
data that do not change its meaning but might be required by the modelling tool. Some
tools have requirements on the order of attributes, such as the first field being a unique
identifier for each record or the last field being the result field that the model should
It may be important to change the order of the records in the dataset. Perhaps the
modelling tool requires the records to be sorted according to the value of the outcome
attribute.
They are based on the application of a classificatory algorithm that, starting from
a node, develops branches (decisions) and determines the potential outcome of each
Neural networks
Neural networks are models that, through machine learning, attempt to fill the
interpretation gaps in a system. In doing so, they mimic, to some extent, the connections
between neurons that occur in the nervous system of living beings. [ CITATION DAT21 \l
2058 ]
Clustering
Clustering in data mining aims at segmenting elements that have some defining
characteristic in common. In this case, the algorithm takes into account conditions of
Bayesian networks
relationships between variables. They are used to solve both descriptive and predictive
Regression
CONCLUSION:
Data mining is currently gaining popularity as it is used by many companies to analyse
customer behaviour and through this to be able to offer products according to the needs
Referencias
Chowdhary K.R. (2020) Data Mining. In: Fundamentals of Artificial Intelligence. Springer, New
Delhi. https://bibliotecas.ups.edu.ec:2582/10.1007/978-81-322-3972-7_17
L. Xu, C. Jiang, J. Wang, J. Yuan and Y. Ren, "Information Security in Big Data: Privacy and Data
Mining," in IEEE Access, vol. 2, pp. 1149-1176, 2014, doi:
10.1109/ACCESS.2014.2362522.