Data Mining Implementation

DATA MINING IMPLEMENTATION MATERIAL
USING THE CLUSTERING METHOD TO CONDUCT COMPETITIVE INTELLIGENCE

Astrindo Vocational High School, Tegal city
Learning in 2022/2023
Definition of Data Mining

Data Mining is a series of processes to explore added value from a data set in the form
of knowledge that has not been known manually. It should be remembered that the word
mining itself means an attempt to obtain a small amount of valuable goods from a large
number of basic materials. Because of that DM actually has long roots from fields of science
such as artificial intelligence (artificial intelligence), machine learning, statistics and databases.
Data mining is the process of applying these methods to data with a view to uncovering hidden
patterns. In other words, data mining is a process for extracting patterns from data. Data
mining is becoming an increasingly important tool for turning that data into information. It is
often used in various profiling practices, such as marketing, surveillance, fraud detection and
scientific discovery. It has been used for years by businesses, scientists and governments to sift
through volumes of data such as airline passenger travel records, census data and supermarket
scanner data to produce market research reports.
The main reason for using data mining is to aid in the analysis of collections of
behavioral observations. The data is susceptible to collinearity because it is known to be
related. An unavoidable fact of data mining is that the subset/set of data being analyzed may
not be representative of the entire domain, and thus may not contain examples of certain
critical relationships and behaviors that exist in other parts of the domain. To address this kind
of problem, analysis can be augmented using trial-based and other approaches, such as Choice
Modeling for human-generated data.
In this situation, inherent correlations can be either controlled for, or removed
altogether, during the construction of the experimental design. Some techniques are often
mentioned in
Data Mining literature in its application includes: clustering, classification, association rule
mining, neural networks, genetic algorithms and others. What differentiates the perception of
Data Mining is the development of Data Mining techniques for applications 1 on large-scale
databases. Prior to the popularity of data mining, these techniques could only be used for
small-scale data.
Data Mining Process
Data Mining Stage. Because Data Mining is a series of processes, Data Mining can be
divided into several stages:
1. Data cleaning (to remove inconsistent and noisy data)
2. Data integration (combination of data from several sources)
3. Data transformation (data is converted into a form suitable for mining)
4. Application of Data Mining techniques
5. Evaluate the patterns found (to find interesting/valuable ones)
6. Presentation of knowledge (with visualization techniques)
Data Mining Techniques

Here are some of the most popular types of data mining techniques known and used:
1. Association Rule Mining
Association rule mining is a mining technique to find associative rules between a
combination of items. The importance of an associative rule can be determined by two
parameters, support, which is the percentage of item combinations. in the database and
confidence, namely the strength of the relationship between items in associative rules. The
most popular algorithm is known as Apriori with
the generate and test paradigm, namely the creation of possible item combination candidates
based on certain rules and then testing whether the item combination meets the minimum
support requirements. Combinations of items that meet these conditions. called the frequent
itemset, which will be used to make rules that meet the minimum confidence requirements. A
new, more efficient algorithm called FP-Tree.
2. Classification Classification
Classification Classification is the process of finding models or functions that explain or
differentiate concepts or data classes, with the aim of being able to estimate the class of an
object whose label is unknown. The model itself can be an "if then" rule, a decision tree, a
mathematical formula or a neural network. Decision tree is one of the most popular
classification methods because it is easy for humans to interpret. Here each branch states the
conditions that must be met and each end of the tree states the data class.
The most well-known decision tree algorithm is C4.5, but recently an algorithm that is
capable of handling large-scale data that cannot be accommodated in main memory has been
developed, such as RainForest. Other classification methods include Bayesian, neural networks,
genetic algorithms, fuzzy, case-based reasoning, and k-nearest neighbors. The classification
process is usually divided into two phases: learning and testing. In the learning phase, some
data whose data class is known is fed to form an approximate model. Then in the test phase the
model that has been formed is tested with some other data to determine the accuracy of the
model. If the accuracy is sufficient, this model can be used to predict unknown data classes.
. Clustering
In contrast to association rule mining and classification where data classes have been
predetermined, clustering groups data without being based on a particular data class. Even
clustering can be used to label the unknown data class. Because of that clustering is often
classified as an unsupervised learning method. The principle of clustering is to maximize the
similarities between members of one class and minimize the similarities between
classes/clusters. Clustering can be done on data that has several attributes that are mapped as
a multidimensional space. Many clustering algorithms require a distance function to measure
the similarity between data, methods are also needed to normalize the various attributes of the
data.
Several categories of clustering algorithms that are widely known are the partition
method where the user must determine the number of partitions desired and then each data is
tested to be included in one of the partitions, another method that has long been known is the
hierarchical method which is divided into two again: bottom-up which combines clusters small
clusters into larger clusters and top-down which breaks large clusters into smaller clusters.
The weakness of these 3 methods is that if one of the mergers/splits is done in the
wrong place, an optimal cluster cannot be obtained. The approach that is widely taken is to
combine the hierarchical method with other clustering methods such as that done by
Chameleon. Recently, a method based on data density has also been developed. namely the
amount of data that is around a data that has been identified in a cluster. If the amount of data
within a certain range is greater than the threshold value, the data will be included in the
cluster. The advantage of this method is that the form of clusters is more flexible. The well-
known algorithm is DBSCAN.
4. Divinsive Hierarchy Algorithm
The first step in the divisive hierarchy algorithm is to form a large cluster that can be
occupied by all data objects. In the next step, one large cluster is separated into several smaller
clusters with data characteristics that have greater similarity to one another, so that data that
does not have a large enough similarity are in separate clusters.
Implementation (application) of Data Mining
In what fields can data mining be applied? Here are some examples of areas of
application of data mining:
• Market analysis and management
Solutions that can be solved with data mining include: Shooting the target market, Seeing user
buying patterns from time to time, Cross-Market analysis. Customer Profiles, Identifying
Customer Needs, Assessing Customer Loyalty, Summary Information.
• Company Analysis and Risk Management.
Solutions that can be solved with data mining include: Financial planning and asset evaluation,
Resource Planning, Competition.
• Telecommunications.
A telecommunications company applies data mining to see which of the millions of incoming
transactions, which transactions still have to be handled manually.
• Finance.
The Financial Crimes Enforcement Network in the United States recently used data mining to
mine trillions of subjects such as property, bank accounts and other financial transactions to
detect suspicious financial transactions (such as money laundering).
• Insurance.
The Australian Health Insurance Commission uses data mining to identify unnecessary health
services but are still performed by insurance participants.
• Sport.
IBM Advanced Scout uses data mining to analyze NBA game statistics (number of blocked shots,
assists and fouls) in order to achieve competitive advantage for NBA teams.
• Astronomy
The Jet Propulsion Laboratory (JPL) in Pasadena, California and the Palomar Observatory
managed to find 22 quasars with the help of data mining. This is one of the successful
applications of data mining in astronomy and space science.
• Internet Web surf-Web
IBM Surf-Aid uses data mining algorithms to record Web page access, especially those related
to marketing in order to see customer behavior and interests and see the effectiveness of
marketing via the Web.
Application Case Examples
"Implementation of data mining using the Clustering technique to carry out
Competitive Intelligence for companies Development of data mining software using the
clustering method using a divisive hierarchical algorithm for grouping customers in this case
study, the functions used are functions to determine central points which are useful as centers
customer group center.
1. Problem Analysis:
In business activities to maintain its marketing area, the Benteng Jewellry store
experienced several problems related to the need for data and information about customers,
so that to carry out promotional activities in order to maintain market share so that it could
survive amid the economic crisis it experienced several problems. These problems include,
among others, namely:
• It is difficult to carry out an effective marketing analysis because there is no system that can
present historical data so that it can provide output on how many customers are owned and
groups of customers who are active or not according to the frequency of transactions, because
the existing data is still in the form of manual data and has not been fully utilized.
• It is not known with certainty the number of customers who are active and who are not active
in making transactions, so it is very difficult to carry out promotional actions as well as giving
bonuses or discounts to each customer who is properly owned.
• There are too many business competitors, so we need a system that can detect how many
customers are active and who are not active in transactions as a decision support system, so
that it can be used to design an effective business strategy to maintain market share in
competition with competitors in the midst of an economic crisis globally as it is today.
2. Troubleshooting
Based on the background of the problems above, a system is needed that is able to
manage customer data that can provide output in the form of the total number of customers
and customer groups that state the activity of making transactions so that it can be used to
carry out customer relationships for the smooth running of promotional activities to maintain
market share so that This store can survive amidst the global economic crisis. On the basis of
the above analysis, the authors are interested in researching this field by taking the title
"Implementation of Data Mining with the Clustering Method for Competitive Intelligence
Companies" for grouping customers.
a. System Requirements Analysis
System requirements analysis serves to define the system requirements to be built.
This analysis aims to produce data that can be integrated with the desired data mining analysis.
b. Data Needs Analysis
Data analysis will identify data requirements that are in accordance with the provisions
required by the system from incomplete and inconsistent data that usually occurs in existing
databases. This analysis includes
:
• Data cleaning process (data cleaning).
• Analysis of target data.
• Data integration process (data integration).
• Data selection process (data selection).
• Data transformation process (data processing formation).
• Analysis of input, process and output data needs.
c. Analysis of Hardware and Software Requirements
This analysis describes the tools needed in building a system consisting of hardware
and software components. The hardware component required by the system is a PC or
workstation with minimum specifications, as follows:
• Hardware
Processor Intel Pentium IV or more, RAM 512 or more, HDD 80GB, VGA 12 MB shared, CD-
RW/DVD-RW
• Software
o 1111111111 1111 Operating system: Windows 98/2000/XP
o XAMPP-win32-1.6.7
o web browser: Ms. Internet Expirer, Mozilla Firefox 3.0
3. System design
In designing this system, the method used is an object-oriented system design method
(Object Oriented Analysis) using the Unified Modeling Language (UML).
Implementation and Analysis Results

In the chapter on implementation and analysis of results, it will be explained about the
software development that was designed previously in the previous chapter, namely the
chapter on analysis and design. Implementation of the software development plan in this case
study. includes:
1.Implementation database
In this case study, the database system used is ApacheFriends XAMPP version 1.6.7.
Because the initial database in this case study was owned in the form of a database in manual
form, the formation of the database in this case study was made through building a new
database by forming each required table and filling in data by inputting data one by one into
the system. the existing database in the ApacheFriends XAMPP application does not go through
the process of exporting and importing data.
The database development process in accordance with the software to be built is as
follows:
• Formation of the tables needed in the database and determine the table structure.
• Establishing a new database, namely clustering database.
• Inputting data into every table in this clustering database except for the frequency table
which is the process table in this study.
• Fill in the frequency table which is the process data in this case study, by reading the
customer id in the transaction table based on the id_customer in the customer table which is
done directly by the system.
2. Function Implementation
The development of data mining software using the clustering method uses a divisive
hierarchical algorithm for customer grouping. In this case study, the functions used are
functions to determine central points that are useful as customer group centers. These
functions are as follows (santosa. 2007):
Step 1:
The function to determine the starting point of all existing customer data based on
transactions made using the calculation of the average (mean) value of all data in the
transaction frequency table. In this step, the calculation of the average value (mean) is used
because to anticipate the existence of an outline value (a value that is very far from the existing
data) of the data in the frequency table.
Step 2:
After the clusters in step I are formed, then in step 2 a re-check is carried out for
calculating the center point of each cluster using the calculation of the median value
(calculation of the median value). The use of the median value calculation is because all the
existing data is known in step 1, so there is no worry about the appearance of outline data.
Step 3:
The function used in this step is the same as the function used in step II. Namely the
use of the calculation of the mid value (median). The function in this step is used to check
whether the cluster center point that has been formed in the previous step has not changed
anymore or not, by comparing the results of the calculation of the center point in this step with
the previous step. If the center point has not changed, the customer cluster formation has been
completed. But what if the center point is still
change, then recalculate as in step II. This calculation will continue to be repeated until the
cluster center point does not change anymore.
3. System Implementation
In this case study, the system built is a data mining software with a clustering method
using a devisive hierarchy algorithm. This software contains normalized database display forms,
display forms for transaction frequency data and display forms for the results of grouping
customer data into several clusters. This software functions to look for interesting patterns
from the database in the form of transaction frequency values to group customers.
The software used to build this software is the PHP program code with the XAMMP
database server. To run it, simply use a web browser application such as Windows Internet
Explorer, Firefox. Flock web browser, or similar web browser application but must be with a
database server application, namely XAMMP already installed on the PC where the application
is opened, this is because this software requires database input in tabular form to be able to
carry out the clustering process. The end result of the software built in this case study is in the
form of a customer grouping table and percentage charts of the grouping table. So that these
results can later be used by users, in this case marketing and customer service managers as
decision support such as increasing promotions for customers who are less active, or other
business strategies.
The forms or main page pages that are the contents of this software or application are
as follows:
• Pages long in.
• Main menu page and normal data view.
• View frequency data page.
• Cluster page
4. System Testing
System testing is used to check system performance when a user, in this case the
manager, runs the system. This test includes:
• Testing access rights or longin.
• Testing cluster formation based on data.
Conclusion
Based on the case studies that have been carried out, starting from the stage of
literature study, observational studies, system design and implementation, the results obtained
can be drawn the following conclusions:
1) The clustering method with a divisive hierarchical algorithm can be used to group customers
to make the company's business intelligence competitive.
2) Information from a customer's transaction frequency can be used to build a system that can
transform customer data into information that is useful for carrying out competitive intelligent
company business processes.
3) The program is only designed for one user, namely the marketing manager and customer
service, so that other unauthorized users will not be able to access this program because the
username and password as valid program access rights are only designed for one user and no
facilities are provided for adding access rights.
4) The program will carry out the updating process automatically if there is a change in the
database, especially in the transaction table and customer table for existing data values in the
transaction frequency table and also the cluster process result tables and percentage graphs.
5) To form a cluster, a central point is needed which can be found from all the data in the
transaction frequency table by calculating the mean value or median value calculation method.
6) Applications can still run well when run on three different web browsers, namely Internet
Explorer, Mozilla Firefox and Flock web browsers.
7) The results of this application can be used as decision support by managers for their
customers. For example, decision support to increase promotions to some customers who are
in less active and moderate clusters or decisions to provide more exclusive facilities or give
bonuses or discounts to customers in active clusters.

Data Mining Implementation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Implementation

Uploaded by

Copyright:

Available Formats

DATA MINING IMPLEMENTATION MATERIAL

USING THE CLUSTERING METHOD TO CONDUCT COMPETITIVE INTELLIGENCE

Definition of Data Mining

Data Mining Techniques

Implementation and Analysis Results

You might also like