You are on page 1of 5

Data Mining on Sales Transaction Data Using the

Association Method with Apriori Algorithm


Asrul Sani Samuel Nur Nawaningtyas P
Department of Informatics Department of Informatics Department of Informatics
STMIK Widuri STMIK Widuri STMIK Widuri
Jakarta, Indonesia Jakarta, Indonesia Jakarta, Indonesia
asrulsani@kampuswiduri.ac.id samuelsukmana100@gmail.com tyaspusparini@kampuswiduri.ac.id

Bayu Waseso Goldie Gunadi Tri Haryanto


Department of Information System Department of Informatics Board of Director
Universitas Mercu Buana STMIK Widuri Aru Raharja
Jakarta, Indonesia Jakarta Indonesia Jakarta Indonesia
bayu.waseso@mercubuana.ac.id send2goldie@gmail.com tri.haryanto@aruraharja.co.id

Abstract— The purpose of this study was to determine functions to form candidate item combinations and then
consumer buying patterns at CV. XYZ by utilizing one of the tests whether the item combination meets the minimum
data mining methods, namely the association method. The support and confidence parameters which are the threshold
apriori algorithm can be used to find association rules from values given by the author. In addition to these two
sales transaction data in the company with the help of the
RapidMiner application. Thus the sales transaction data in
parameters, the lift parameter is also used to measure how
the company can be reprocessed to obtain critical important the rules that have been formed are based on the
information. Sales transaction data will be processed using support and confidence values by providing information on
Knowledge Discovery in Database (KDD). The test results whether it is true that product A was purchased together
with the RapidMiner application get four association rules. with product B [8]. The supporting application that can be
The best association rule is that if consumers buy Pants with used to find association rules from transaction data namely
code 1076, they are also likely to purchase Pants with code RapidMiner. RapidMiner is open-source software that
0814 (confidence = 83.3% & lift = 18.5). provides solutions for analyzing data mining, text mining
and predictive analysis [9].
Keywords— Association Method, Apriori Algorithm,
Association Rules, Data Mining
The author conducted research in retail garment
companies. So far, to see the results of sales of existing
I. INTRODUCTION products, the company still uses the difference between the
In Indonesia, many companies are engaged in the same goods produced and the goods sold. In addition, there is a
field. For example, garment retail is growing and in habit of consumers who always buy more than one type of
demand from young people to adults. The company must goods in one transaction. In interviews conducted by the
have various strategies to compete so that its business can author with the HRD Manager. The process of recording
develop and earn profits. One way is by utilizing all sales sales transactions using the Accurate application and
transaction data that has occurred in the company itself [1]. Microsoft Excel. To find out the types of goods that
Various problems are often experienced, such as not consumers like, the company looks at the application and
knowing the layout arrangement of goods based on consumers' opinions. This company does not have a unique
consumer habits of buying goods simultaneously. In method to determine the pattern of consumer buyers. So to
addition, some companies still do not know the pattern of find out which items are bought together by consumers in
purchasing goods, including what goods are purchased one transaction, the company looks at the sales transaction
simultaneously by consumers in one transaction [2]. records in the Accurate application and Microsoft Excel.
Business activities that run every day also cause transaction Data mining science that uses the association method
data to increase, but the data is only stored as an archive with the apriori algorithm is expected to be a solution to
and only used for making sales reports. Actually, the data provide an overview to the company regarding the pattern
has beneficial information, especially for retail business of linkage of any goods that consumers usually buy
people for the progress of running their business [3]. together as a reference for companies in producing goods
The ability and speed to process big data into useful based on consumer habits. In addition, the results obtained
information are very much needed by companies in can provide information about the placement of items that
formulating effective and efficient business strategies [4]. should be close together to make it easier for consumers to
With relevant information, purchasing patterns in an item find them. In this way, extensive sales transaction data can
can be used to improve sales performance so that we can be utilized as well as possible to gain knowledge that
make the right decisions. One way is using data mining benefits the company and not only be used as archives or
techniques to find these patterns [5]. With data mining, reports.
large transaction data will be explored for added value to Based on the explanation above, this study aims to
obtain knowledge that has not been known manually [6]. generate consumer purchasing patterns sourced from CV.
The association method is one of the methods found in data XYZ sales transaction data in 2021 by using RapidMiner
mining. This method aims to find the relationship between as a supporting application to find association rules. In
items from a database [7]. The apriori algorithm, including addition, from these results, what types of goods must be
the types of association rules found in data mining, sold and come from any category is also known.
The 10th International Conference on Cyber and IT Service Management (CITSM 2022)
Yogyakarta, September 20-21, 2022

II. THEORITICAL FRAMEWORK Stages Description


Evaluation At this stage, the author will evaluate the
In this study, the author uses the Knowledge Discovery results of the data mining process against the
in Database (KDD) stage of processing data so that it can association rules by looking at the highest
go to the data mining process. Knowledge Discovery in support, confidence, and lift parameters.
Database (KDD) is another name for data mining, which
includes collecting and using historical data to find The association method in the retail business is better
regularities, patterns, or relationships in large databases known as Market Basket Analysis (MBA) which means a
[10]. The stages in KDD consist of Data Selection, method for finding associations of what goods are likely to
Preprocessing, Transformation, Data Mining, and be bought together by consumers in one transaction [12].
Evaluation [11]. The association search process uses the apriori algorithm to
generate association rules from previously transformed
datasets [13]. The stages of forming association rules using
the apriori algorithm consisting of [14] :

High-frequency pattern analysis


The initial stage is to find a combination of items that
meet the minimum requirements of the support value.
Support means how many items/itemsets appear from the
whole transaction. The formula for finding the support
value of one item is as follows:

∑ 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐴) =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠

The formula for finding the support value for two items is
as follows:

∑ 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴 𝑑𝑎𝑛 𝐵


Fig 1. KDD Stages 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝐴 ∩ 𝐵) =
𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠
TABLE I. EXPLANATION OF KDD STAGES
Establishment of association rules
Stages Description After the results of the high-frequency pattern are
Data Selection The author chooses the attributes of the data found, then look for association rules based on the
to be used based on the objectives to be
achieved and then stored in a separate file minimum requirements of the confidence value.
from operational data. The attributes of the Confidence means how often item A is purchased together
operational data include the date, purchase ID, with item B. The formula for finding the confidence value
item code, and amount of items sold, but the is as follows:
author only chooses the date and item code
because the author will sort the items sold
(item code) by date for one year so that the ∑ 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴 𝑑𝑎𝑛 𝐵
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 𝑃(𝐵|𝐴) =
information contained is representative as a 𝑡𝑜𝑡𝑎𝑙 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝐴
research indicator. Then the data will be
stored separately in an excel file. Association rules test
Preprocessing After the data selection stage, the author will
clean up the data that is the center of KDD's
To find out if the association rule is strong or not, it can
attention, such as eliminating duplicate data, be calculated again with the lift value. The association rule
checking inconsistent data, and correcting is valid if it produces value > 1, meaning that in the
typographical errors. From the checking that transaction, products A and B are purchased
the author did from the data that had been
selected at the data selection stage, there were
simultaneously, and there are benefits from the rule. The
no errors such as duplicated, inconsistent, or higher the lift value, the greater the strength of the
typographical data. association.
Data Transformation Data that has passed the data selection and
preprocessing stages, then the author will 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 (𝐴 ∩ 𝐵)
change it into tabulation form. The tabulated 𝐿𝑖𝑓𝑡 =
𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝐴 𝑥 𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝐵
data will contain binary numbers 0 and 1. 0
(negative) means no transactions have been
made, and 1 (positive) means transactions are
made. III. RESEARCH METHOD
Data Mining At this stage, the process of searching for new
patterns or helpful information using specific
This study uses qualitative research methods that aim to
methods and algorithms. So after the author find answers to an event or question through systematic
has finished creating tabulated data, it will scientific procedures and is carried out using various
then be processed using the RapidMiner methods such as interviews, observations, and documents
application to find the association rules using
[15]. Primary data come from interviews and company
the association method with the apriori
algorithm. sales transaction data from HRD Manager. The secondary
The 10th International Conference on Cyber and IT Service Management (CITSM 2022)
Yogyakarta, September 20-21, 2022

data comes from journals and books related to research. 7. Conclusion


The stages or processes in this research can be seen in The last stage is drawing conclusions based on the
Figure 2 below. objectives and essential points of the author's
research.
IV. RESULT AND DISCUSSION
START
CV. XYZ sales transaction data in 2021 is converted
into tabulated data, then processed using the RapidMiner
application. In the RapidMiner application, several
Data & operators are connected to process tabulated data to
Identification
Goal setting information generate consumer purchasing patterns. The operators used
of problem
collection are Read Excel, Select Attributes, Numerical to Binominal,
Remap Binominals, FP-Growth, and Create Association
Processing Method &
Rules.
Result
with algorithm
analysis
RapidMiner used

Conclusion

END

Fig 2. Flowchart Stages of Research

1. Identification of problems
The research stage begins with determining the Fig 3. RapidMiner Operators Used
research problem, namely CV. XYZ has a lot of sales
transaction data, but it is only used as an archive Determination of the minimum limit of support and
without being reused. Whereas the sales transaction confidence
data can be reprocessed to obtain critical information. The minimum support limit used is 0.015 because the
2. Goal setting number of item combinations appears most often three
Problems identified then determine the objectives to times. The items sold are of many types, which are not
proportional to the number of purchases. The minimum
be achieved, one of which is finding consumer
confidence limit used is 0.4 to produce the best association
purchasing patterns from large sales transaction data.
rules.
To accomplish this goal, one must study various
literature relevant to the research problem. Literature
Association rules results
can be in journals or books on data mining that use
association methods with apriori algorithms. The picture below is a result of the association rules
3. Data and information collection from CV. XYZ sales transaction data in 2021 produced by
Observations and direct interviews with companies RapidMiner.
can be carried out to determine the problem clearly
and obtain sales transaction data in 2021, which will
be processed to find association rules.
4. Methods and algorithms used
Processing of sales transaction data in 2021 that has
been obtained will be processed using the association
method with the apriori algorithm.
5. Processing with RapidMiner
The author uses the RapidMiner application to Fig 4. Association Rules 2021 in RapidMiner
process sales transaction data in 2021 by entering
tabulation data into the RapidMiner application to The result of association rules produces four rules. The
find the rules for the sales transaction data. explanation are:
6. Result analysis Rule 1 : If you buy PN 0773, you have 41.7% chance of
The author will then analyze the association rules buying PN 0844. The strength of the relationship is 9.25
generated from the RapidMiner application to obtain Rule 2 : If you buy PN 0844, you have 50% chance of
other important information.
The 10th International Conference on Cyber and IT Service Management (CITSM 2022)
Yogyakarta, September 20-21, 2022

buying PN 0773. The strength of the relationship From the resulting association rules, we can see that
is 9.25. the resulting support is the same high, but that does
Rule 3 : If you buy PN 0814, you have 50% chance of not mean that the confidence and lift generated are
buying PN 1076. The strength of the relationship also the same. For example, rule number 1 produces
is 18.5. lower confidence and lift than other association rules.
Rule 4 : If you buy PN 1076, you have 83.3% chance of
buying PN 0814. The strength of the relationship
is 18.5. V. CONCLUSION
The attributes are contained in CV. XYZ sales
Confidence (%) Lift transaction data can be used to perform data mining
analysis using the association method with the apriori
90 algorithm assisted by the RapidMiner application.
80 Companies can find the relationship of goods with one
70 another from many sales transactions. This can be seen
from the four association rules produced by RapidMiner
60
that the Pants category is the most often purchased together,
50 especially the type PN 1076 ⇒ PN 0814. The best
40 association rules can be used as recommendations for
30 companies to produce goods the following year.
The arrangement of goods based on the highest support
20
can be placed at the beginning because consumers most
10 often purchase these items. The collection of goods based
0 on the highest confidence can be placed side by side
PN 0773 ⇒ PN 0844 ⇒ PN 0814 ⇒ PN 1076 ⇒ because consumers will buy these items together. Another
PN 0844 PN 0773 PN 1076 PN 0814 thing that can be done is the creation of a sales brochure
that places items with the best association rules on one page
Fig 5. Association Rules Graph in 2021 so that consumers can easily see these items.
Association rules result analysis
Analysis of the results of association rules is needed to find REFERENCE
out more about the rules generated by RapidMiner. [1] A. Oktaviani, G. TM Napitupul, D. Sarkawi, and I. Yulianti,
1. Category “Penerapan Data Mining Terhadap Penjualan Pipa Pada Cv.
The Pants category dominated the association rules Gaskindo Sentosa Menggunakan Metode Algoritma Apriori,” J.
from sales transaction data in 2021. Ris. Inform., vol. 1, no. 4, pp. 167–172, 2019, doi:
10.34288/jri.v1i4.96.
2. The highest support [2] J. L. Putra, M. Raharjo, T. A. A. Sandi, R. Ridwan, and R.
Four resulting rules get the same high support, Prasetyo, “Implementasi Algoritma Apriori Terhadap Data
meaning that the combination of these items often Penjualan Pada Perusahaan Retail,” J. Pilar Nusa Mandiri, vol.
appears throughout the transaction. 15, no. 1, pp. 85–90, 2019, doi: 10.33480/pilar.v15i1.113.
[3] Y. Wahyuningtias and R. Rusdiansyah, “Analisis Penerapan
3. The highest confidence Asosiasi Untuk Menentukan Transaksi Penjualan Pada What’S
The highest confidence from rule number 4 (PN 1076 Up Café Dengan Metode Algoritma Apriori,” J. Ris. Inform.,
⇒ PN 0814) means that two items are the most vol. 1, no. 4, pp. 181–186, 2019, doi: 10.34288/jri.v1i4.92.
frequently purchased together by consumers. [4] N. Fitrina, K. Kustanto, and R. T. Vulandari, “Penerapan
Algoritma Apriori Pada Sistem Rekomendasi Barang Di
4. The highest lift Minimarket Batox,” J. Teknol. Inf. dan Komun., vol. 6, no. 2,
The highest lift is from rule number 3 (PN 0814 ⇒ PN pp. 21–27, 2018, doi: 10.30646/tikomsin.v6i2.376.
1076) and rule number 4 (PN 1076 ⇒ PN 0814), [5] A. Sani, “Analisa Penjualan Retail dengan Metode Association
meaning that the resulting association rules have the Rule untuk Pengambilan Keputusan Strategis Perusahaan: Studi
Kasus PT,” XYZ. Infotech, no. September, 2016, [Online].
most excellent/valid association strength compared to Available: https://www.researchgate.net/profile/Asrul-
association rules number 1 and 2. Sani/publication/327680554_ANALISA_PENJUALAN_RET
5. The best association rules AIL_DENGAN_METODE_ASSOCIATION_RULE_UNTUK
Rule number 4 is the best association rule because the _PENGAMBILAN_KEPUTUSAN_STRATEGIS_PERUSAH
AAN_Studi_Kasus_PT_XYZ/links/5b9e8660299bf13e60373b
confidence and lift produced are the greatest, meaning 02/ANALISA-PENJUALAN-RETAIL-DENGA.
in 2021, most consumers often buy Pants 1076 and [6] V. N. Budiyasari, P. Studi, T. Informatika, F. Teknik, U.
Pants 0814 together. Nusantara, and P. Kediri, “Implementasi Data Mining Pada
6. The similarity of confidence from the resulting lift Penjualan kacamata Dengan Menggunakan Algoritma Apriori,”
Indones. J. Comput. Inf. Technol., vol. 2, no. 2, pp. 31–39, 2017.
(PN 0844 ⇒ PN 0773) and (PN 0814 ⇒ PN 1076) [7] Nurdin and D. Astika, “Penerapan Data Mining Untuk
produced the same confidence but the lift produced Menganalisis Penjualan Barang dengan Menggunakan Metode
was different, meaning that the resulting association Apriori pada Supermarket Sejahtera Lhoksumawe,” J. Ilm.
strength would be slightly reduced. Rekayasa dan Manaj. Sist. Inf., vol. 4, pp. 77–80, 2018.
[8] D. A. N. Wulandari and L. Ningsih, “Data Mining Market
7. The similarity of support from the resulting Basket Analysis Menggunakan Algoritma Apriori Untuk
confidence and lift Menentukan Persediaan Obat,” Konf. Nas. Imu Sos. Teknol.,
vol. 1, no. 1, pp. 227–235, 2017.
The 10th International Conference on Cyber and IT Service Management (CITSM 2022)
Yogyakarta, September 20-21, 2022

[9] Aprilla Dennis, “Belajar Data Mining dengan RapidMiner,”


Innov. Knowl. Manag. Bus. Glob. Theory Pract. Vols 1 2, vol.
5, no. 4, pp. 1–5, 2013, [Online]. Available:
http://esjournals.org/journaloftechnology/archive/vol1no6/vol1
no6_6.pdf%5Cnhttp://www.airccse.org/journal/nsa/5413nsa02.
pdf.
[10] Amrin Amrin, “Data Mining Dengan Algoritma Apriori untuk
Penentuan Aturan Asosiasi Pola Pembelian Pupuk,”
Paradigma, vol. XIX, no. 1, pp. 74–79, 2017, doi:
https://doi.org/10.31294/p.v19i1.1836.
[11] M. Syahril, K. Erwansyah, and M. Yetri, “J-SISKO TECH
Jurnal Teknologi Sistem Informasi dan Sistem Komputer TGD
Penerapan Data Mining Untuk Menentukan Pola,” ◼, vol. 118,
no. 1, pp. 118–136, 2020.
[12] G. Gunadi and D. I. Sensuse, “Penerapan Metode Data Mining
Market Basket Analysis Terhadap Data Penjualan Produk Buku
Dengan Menggunakan Algoritma Apriori Dan Frequent Pattern
Growth ( Fp-Growth ) :,” Telematika, vol. 4, no. 1, pp. 118–132,
2012.
[13] D. Listriani, A. H. Setyaningrum, and F. Eka, “PENERAPAN
METODE ASOSIASI MENGGUNAKAN ALGORITMA
APRIORI PADA APLIKASI ANALISA POLA BELANJA
KONSUMEN (Studi Kasus Toko Buku Gramedia Bintaro),” J.
Tek. Inform., vol. 9, no. 2, pp. 120–127, 2018, doi:
10.15408/jti.v9i2.5602.
[14] R. Takdirillah, “Penerapan Data Mining Menggunakan
Algoritma Apriori Terhadap Data Transaksi Sebagai
Pendukung Informasi Strategi Penjualan,” Edumatic J.
Pendidik. Inform., vol. 4, no. 1, pp. 37–46, 2020, doi:
10.29408/edumatic.v4i1.2081.
[15] Sugiyono, Metode Penelitian Kuantitatif, Kualitatif dan R & D.
Alfabeta, Bandung, 2015.

You might also like